DSI Capstone project

This page is for DSI Capstone project presatation.

0. Technology list

Name Description
Python This project using Python-2.7 environment
Numpy The fundamental package for scientific computing with Python
Pandas Easy-to-use data structures and data analysis tools for Python  
Beastiful Soup Convert web page to lxml so that all infomation on that page can be accessed and scraped easily  
Selenium Function as automates browsers in order to scrape infomation held by JavaScript 
Seaborn A Python visualization library provides a high-level interface for drawing attractive statistical graphics.   
Sci-kit Learn Simple and efficient tools for data mining and data analysis    
CouchDB NoSQL database for data storage
AWS Using Amazon EC2 to perform secure and no-sleep cloud compute

1. Backgroud Description

1.1 Problem Statement

Melbourne has become the most liveable city all over the world for 7 years. As an important part of living in Melbourne, housing market is also getting increasingly attention. In this research, I will discuess below questions:

1.2 Potential Audience

1.3 Goals

1.4 Success Metrics

1.5 Data Sources

1.5.1 National Data

Data from Australian Bureau of Statistic: Residential Property Price Indexes provides estimates of changes in residential property prices in each of the eight capital cities of Australia and related statistics.

1.5.2 Suburb Data

Direct crawling raw data from domain subueb profile with CSS selector and selemium. See crawling script at suburb scraping notebook.

1.5.3 House Data

Direct crawling raw data from domain.com.au with CSS selector. See crawling script at Houses scraping notebook.

2. Findings

See PPT.