Bite-Sized Data Science with Python and Pandas: Introduction

Follow along as we analyze a real-life dataset and learn data science with Python and Pandas
4.5 (11 ratings)
Instead of using a simple lifetime average, Udemy calculates a
course's star rating by considering a number of different factors
such as the number of ratings, the age of ratings, and the
likelihood of fraudulent ratings.
51 students enrolled
25% off
Take This Course
  • Lectures 24
  • Length 1 hour
  • Skill Level Intermediate Level
  • Languages English
  • Includes Lifetime access
    30 day money back guarantee!
    Available on iOS and Android
    Certificate of Completion
Wishlisted Wishlist

How taking a course works


Find online courses made by experts from around the world.


Take your courses with you and learn anywhere, anytime.


Learn and practice real-world skills and achieve your goals.

About This Course

Published 12/2015 English

Course Description

Learn the basics of data science with Python, with this short course designed for students to follow along, and built around a concrete, real-world dataset.

Listening to theoretical examples is never fun, and I've always liked actually applying what I learn to concrete examples, so this course is built around us analyzing a real-life dataset together. The dataset we'll be using is the "Parkinson's Disease Telemedicine dataset", and our goal will be to see if we can predict the severity of Parkinson's Disease in patients from just a dozen simple measurements, which would be a vast improvement over the current time consuming process that doctors and patients have to go through.

This course will provide a good introduction to several different aspects of data science, and all in Python, one of the most popular and powerful languages used by data scientists today.

You'll learn how to:

- Set up your data analysis research environment (in an iPython notebook)

- Visualize the data to understand it better

- Manipulate and transform data to prepare it for modeling

- Apply a statistical model to the data

The course is comprised of short lectures which walk you through the data analysis, as you follow along. There are also several coding exercises throughout to test your knowledge!

Check out the course to learn data science with Python today!

What are the requirements?

  • Students should have experience writing, at a minimum, basic programs in Python

What am I going to get from this course?

  • Manipulate and transform data series and tables in Python
  • Build a multiple regression model in Python
  • Use iPython Notebook for research and analysis in Python
  • Visualize data to glean insights from it, in Python

Who is the target audience?

  • This course is best suited for students who already have a basic understanding of both Python and statistics
  • This course is for students who like learning with real-life, concrete examples, and following along by programming on their own computers
  • This course is for students who want to learn the basics of data manipulation and visualization, and statistical model building in Python

What you get with this course?

Not for you? No problem.
30 day money back guarantee.

Forever yours.
Lifetime access.

Learn on the go.
Desktop, iOS and Android.

Get rewarded.
Certificate of completion.


Section 1: Welcome, information about this course
Section 2: Setting up Python and Libraries
If you already have Python installed
File and command to install all necessary libraries at once, with pip
Links to help you install pip
The libraries, explained
If you want to install Python and the libraries at once
Section 3: Our data set: the Parkinson's Telemedicine Dataset
Downloading the data
A quick explanation of the dataset
Section 4: Starting our analysis
Starting a new iPython Notebook
Loading the data into our iPython Notebook
Section 5: Manipulating data with pandas, the data analysis library
DataFrames are data tables
Series are single rows or columns of data
Slicing DataFrames to get the data we need
Keeping track of the variable names we need
Coding Exercise: summary statistics
Section 6: Visualizing the data to understand it better before modeling
Looking at the data's distributions with box plots and histograms
Seeing multicolinearity with a scatter plot matrix
Coding exercise: a single correlation
Section 7: Transforming the data to prepare it for modeling
Taking care of multicolinearity
Log transforming data to take care of skewed distributions
Coding exercise: practicing apply()
Section 8: Modeling the data
Applying a multiple regression to answer the ultimate question
Section 9: Conclusion
Thank you
Download the data and iPython notebook that was used throughout this lecture

Students Who Viewed This Course Also Viewed

  • Loading
  • Loading
  • Loading

Instructor Biography

Troy Shu, Founder of Terragon, building data-driven products

Troy Shu has worked on Wall Street, at a startup, and has now started his own company, building lots of data-driven products and doing tons of data analysis in Python along the way.

He currently runs his own consulting business, building data-driven products for other companies. Before that, Troy worked at a lending startup called Bond Street, where he built the company's risk models and developed the "MVP" (minimum viable product) for the automated loan underwriting platform. He has also worked at a hedge fund where he built stock picking algorithms and launched a new hedge fund. Troy double majored in Computer Science and Economics, with concentrations in Statistics and Finance, at the University of Pennsylvania.

Ready to start learning?
Take This Course