
Install and launch the Jupyter notebook, then install requests, pandas, and SQLAlchemy to prepare data scraping, cleaning, and output to a PostgreSQL database.
Install PostgreSQL on Windows from the official site, select the correct options, and set a password to enable the database and pgAdmin access.
Create ten city datasets by scraping real estate listings and extracting links and data such as address, bedrooms, bathrooms, square feet, year built, parking, and price across multiple pages.
Extract data from the first link by fetching the page, parsing with BeautifulSoup, and retrieving address, bedrooms, bathrooms, square feet, year built, parking, and price.
Scrape data from multiple pages by iterating page numbers and constructing urls, then compile results into a data frame.
Clean the bedrooms and bathrooms columns by removing bad string elements, converting studio entries to one bedroom, and normalizing values for accurate counts and analysis.
Modify the parking column by applying a keyword-based lambda function in Python to label entries as yes or no, using garage, Copart, car, or open as indicators.
Convert bedrooms, bathrooms, and area square feet to integers by checking string or numeric types and casting, with year built and price handled in a separate video.
Convert the price column to integers by removing plus signs with a lambda, override the old column, verify numeric dtype, and prepare to compute price per square foot.
Explore the data analysis section by applying cleaned data to answer ten questions using Penas library or sequel, or visualization libraries, with prerequisites installing PostgreSQL and configuring inputs and outputs.
Import and prepare real estate data by loading Excel into a pandas dataframe, cleaning columns (area in square feet, price in dollar), and exporting to PostgreSQL via a SQLAlchemy engine.
Answer question one by counting results per location with pandas value_counts and compare with a PostgreSQL group by, then visualize with Matplotlib.
Compute the minimum, maximum, and average price across observed locations using the panel's library, pandas, and SQL. Indianapolis is the minimum and New York is the maximum, with formatted results.
Find the most expensive Las Vegas house using the IDEX max index in the data frame, then extract seven bedrooms and seven bathrooms, and cross-check with PostgreSQL.
Identify the cheapest San Diego house in a real estate data frame with the index min function and confirm if it has a parking opportunity via code and sql cross-checks.
3 Main Topics will be covered:
1) Data Extraction/ Web Scraping
2) Data Cleaning
3) Data Analysis
We start with the extraction of Real Estate Data from 10 different cities/states. Once we have collected all the necessary data, the datasets will be merged to one dataset. Then the cleaning process with the Pandas Library will start. The goal is to make the data database readable and modifying/ manipulating the data. Once our data is cleaned, we start with the Data Analysis part. That means we will answer 10 different questions, based on our extracted and cleaned dataset. To be able to answer the 10 questions, we make use of the Pandas Dataframe, PostgreSQL and Matplotlib. You will get to know different approaches to how answer real life questions based your own created dataset.
After this course you will have the knowledge and the experience to scrape your own data and create your own dataset. After the datasets is created, we will clean the data and finally focus on the data analysis.
With the help of the course resources you will always have documents you can refer to. If you have a question or if a concept just does not make sense to you, you can ask your questions anytime inside the Q&A - Forum. Either the instructor or other students will answer your question. Thanks to the community you will never have the feeling to learn alone by yourself.
What you’ll learn
Web Scraping
Pandas
Beautiful Soup
Data Extraction
Web Scraping for Data Science
Data Mining
Data Scraping & Data Cleaning
Data Analysis
PostgreSQL
Are there any course requirements or prerequisites?
Basic understanding of Python Programming
Basic understanding of Beautiful Soup
Who this course is for:
Everybody who is interested in Web Scraping (Create own dataset), Data Cleaning & Data Analysis
Professionals who want to create their own dataset without being dependent on some else
Disclaimer : I teach web scraping as a tutor for educational purposes. That's it.
The first rule of scraping the web is: do not harm a certain website. The second rule of web crawling is: do NOT harm a certain website.