Python Data Science: Data Prep & EDA with Python
What you'll learn
- Master the core building blocks of Python for data science BEFORE applying machine learning algorithms
- Scope data science projects by clearly defining the goals, techniques, and data sources needed for your analysis
- Import and export flat files, Excel workbooks, and SQL database tables using Pandas
- Clean data by converting data types, handling common data issues, and creating new columns for analysis
- Perform exploratory data analysis (EDA) by sorting, filtering, grouping, and visualizing data to discover patterns and insights
- Prepare data for machine learning models by joining tables, aggregating rows, and applying feature engineering techniques
Requirements
- Jupyter Notebooks (free download, we'll walk through the install)
- Familiarity with base Python and Pandas is recommended, but not required
Description
This is a hands-on, project-based course designed to help you master the core building blocks of Python for data science and machine learning.
We'll start by introducing the fields of data science and machine learning, discussing the difference between supervised and unsupervised learning, and reviewing the Python data science workflow we'll be using throughout the course.
From there we'll do a deep dive into the data prep & EDA steps of the workflow. You'll learn how to scope a data science project, use Python and Pandas to gather data from multiple sources and handle common data cleaning issues, and perform exploratory data analysis (EDA) using techniques like filtering, grouping, and visualizing data.
Throughout the course, you'll play the role of a Jr. Data Scientist for Maven Music, a streaming service that’s been struggling with customer churn. Using the skills you learn throughout the course, you'll use Python to gather, clean, and explore the data to provide insights about their customers.
Last but not least, you'll practice preparing data for data science and machine learning models by joining multiple tables, adjusting row granularity, and engineering useful fields and features.
COURSE OUTLINE:
Intro to Data Science & Machine Learning
Introduce the field of data science, review essential skills, and introduce each phase of the data science workflow
Scoping a Project
Review the process of scoping a data science project, including brainstorming problems and solutions, choosing techniques, and setting clear goals
Gathering Data
Read flat files into a Pandas DataFrame in Python, and review common data sources & formats, including Excel spreadsheets and SQL databases
Cleaning Data
Identify and convert data types, find and fix common data quality issues like missing values, duplicates, and outliers, and create new columns for analysis
Exploratory Data Analysis (EDA)
Explore datasets to discover insights by sorting, filtering, and grouping data, then visualize it using common chart types like scatterplots & histograms
MID-COURSE PROJECT
Put your skills to the test by cleaning, exploring, and visualizing data from a brand-new data set containing Rotten Tomatoes movie ratings
Preparing for Modeling
Structure your data so that it’s ready for machine learning models by creating a numeric, non-null table and engineering new features
FINAL COURSE PROJECT
Apply all the skills learned throughout the course by gathering, cleaning, exploring, and preparing multiple data sets for Maven Music
__________
Ready to dive in? Join today and get immediate, LIFETIME access to the following:
8.5 hours of high-quality video
16 homework assignments
7 quizzes
2 projects (1 mid-course, 1 final)
Data Science in Python: Data Prep & EDA ebook (190+ pages)
Downloadable project files & solutions
Expert support and Q&A forum
30-day Udemy satisfaction guarantee
If you're an aspiring data scientist or business intelligence professional looking for an introduction to the world of machine learning and data science with Python and Pandas, this is the course for you.
Happy learning!
-Alice Zhao (Python Expert & Data Science Instructor, Maven Analytics)
__________
Looking for our full business intelligence stack? Search for "Maven Analytics" to browse our full course library, including Excel, Power BI, MySQL, Tableau and Machine Learning courses!
See why our courses are among the TOP-RATED on Udemy:
"Some of the BEST courses I've ever taken. I've studied several programming languages, Excel, VBA and web dev, and Maven is among the very best I've seen!" Russ C.
"This is my fourth course from Maven Analytics and my fourth 5-star review, so I'm running out of things to say. I wish Maven was in my life earlier!" Tatsiana M.
"Maven Analytics should become the new standard for all courses taught on Udemy!" Jonah M.
Who this course is for:
- Data scientists looking to learn core techniques and best practices for data prep and exploratory data analysis
- Python users who want to build the core skills required before applying AI and machine learning models
- Data analysts or BI experts looking to transition into a data science role
- Anyone interested in learning one of the most popular open source programming languages in the world
Instructors
Maven Analytics is the first purpose-built, online platform for data analysts to learn new skills, showcase their work, and connect with peers and employers.
Named one of the top 10 education companies revolutionizing the industry, Maven's award-winning Guided Learning model allows users to create personalized learning plans, build public portfolios, connect with expert instructors and career coaches, and join a community of world-class analytics talent.
We've helped 1,000,000+ students build job-ready skills, master tools like Excel, SQL, Power BI, Tableau and Python, and build the foundation for a successful career.
Alice Zhao is a data scientist who is passionate about teaching and making complex things easy to understand. She is the author of the book, SQL Pocket Guide, 4th Edition (O'Reilly).
She has 15+ years of experience in the data field, and has taught numerous courses in Python, SQL and R as a data science instructor at Northwestern University, Maven Analytics and Metis, and as a co-founder of Best Fit Analytics.
She has the most popular Natural Language Processing in Python tutorial on YouTube, with over 1,200,000 views. In her spare time, she writes about pop culture and data analysis on her blog, A Dash of Data.
She has an M.S. in Analytics and B.S. in Electrical Engineering, both from Northwestern University.