Find online courses made by experts from around the world.
Take your courses with you and learn anywhere, anytime.
Learn and practice real-world skills and achieve your goals.
Learn the basics of data science with Python, with this short course designed for students to follow along, and built around a concrete, real-world dataset.
Listening to theoretical examples is never fun, and I've always liked actually applying what I learn to concrete examples, so this course is built around us analyzing a real-life dataset together. The dataset we'll be using is the "Parkinson's Disease Telemedicine dataset", and our goal will be to see if we can predict the severity of Parkinson's Disease in patients from just a dozen simple measurements, which would be a vast improvement over the current time consuming process that doctors and patients have to go through.
This course will provide a good introduction to several different aspects of data science, and all in Python, one of the most popular and powerful languages used by data scientists today.
You'll learn how to:
- Set up your data analysis research environment (in an iPython notebook)
- Visualize the data to understand it better
- Manipulate and transform data to prepare it for modeling
- Apply a statistical model to the data
The course is comprised of short lectures which walk you through the data analysis, as you follow along. There are also several coding exercises throughout to test your knowledge!
Check out the course to learn data science with Python today!
Not for you? No problem.
30 day money back guarantee.
Learn on the go.
Desktop, iOS and Android.
Certificate of completion.
|Section 1: Welcome, information about this course|
|Section 2: Setting up Python and Libraries|
If you already have Python installedPreview
File and command to install all necessary libraries at once, with pip
Links to help you install pip
The libraries, explainedPreview
If you want to install Python and the libraries at oncePreview
|Section 3: Our data set: the Parkinson's Telemedicine Dataset|
Downloading the dataPreview
A quick explanation of the dataset
|Section 4: Starting our analysis|
Starting a new iPython Notebook
Loading the data into our iPython Notebook
|Section 5: Manipulating data with pandas, the data analysis library|
DataFrames are data tables
Series are single rows or columns of data
Slicing DataFrames to get the data we need
Keeping track of the variable names we need
Coding Exercise: summary statistics
|Section 6: Visualizing the data to understand it better before modeling|
Looking at the data's distributions with box plots and histograms
Seeing multicolinearity with a scatter plot matrix
Coding exercise: a single correlation
|Section 7: Transforming the data to prepare it for modeling|
Taking care of multicolinearity
Log transforming data to take care of skewed distributions
Coding exercise: practicing apply()
|Section 8: Modeling the data|
Applying a multiple regression to answer the ultimate question
|Section 9: Conclusion|
Download the data and iPython notebook that was used throughout this lecture
Troy Shu has worked on Wall Street, at a startup, and has now started his own company, building lots of data-driven products and doing tons of data analysis in Python along the way.
He currently runs his own consulting business, building data-driven products for other companies. Before that, Troy worked at a lending startup called Bond Street, where he built the company's risk models and developed the "MVP" (minimum viable product) for the automated loan underwriting platform. He has also worked at a hedge fund where he built stock picking algorithms and launched a new hedge fund. Troy double majored in Computer Science and Economics, with concentrations in Statistics and Finance, at the University of Pennsylvania.