Pro data science in Python
4.2 (35 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
488 students enrolled

Pro data science in Python

Learn Keras, Deep Learning, Scikit-learn, Pandas and Statsmodels
4.2 (35 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
488 students enrolled
Created by Francisco Juretig
Last updated 4/2017
English [Auto]
Current price: $31.99 Original price: $49.99 Discount: 36% off
14 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 11.5 hours on-demand video
  • 86 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Use complex scikit-learn tools for machine learning
  • Do statistical analysis using Statsmodels
  • Read, transform and manipulate data using Pandas
  • Use Keras for neural networks
  • Solve both supervised and unsupervised machine learning problems
  • Do time series analysis and forecasting using Statsmodels
  • Classify images using Deep Convolutional Networks
Course content
Expand all 47 lectures 11:20:11
+ Object Oriented programming in Python
2 lectures 37:11

Classes are fundamental for writing clean and robust Python code. We review the basics behind classes creations, constructors, and methods

Classes 1

A powerful element about classes is that they can be inherited. This allows us to write classes, that inherit methods and data from their parent classes.

Classes 2
+ Pandas
5 lectures 58:44

Reading a csv via Pandas, and doing some basic data manipulation

Loading data in Pandas

Lambda expressions allow us to execute functions on every element of a Pandas dataframe, looping in a natural way

Looping through Pandas Datasets - Lambda expressions

We review the different ways of merging data in Pandas. Full, Inner, Left and Right joins

Merging data

Building aggregate analysis is a fundamental technique for validating our analysis and results, before jumping into the actual machine learning algorithms. For example, we can compare our results versus some aggregated reports that we might get, thus validating that our data is in good shape.

Preview 08:16

Pivoting our dataframes is quite easy in Pandas. We show how to transform a data frame from the long format to the wide format and vice-versa

Pivoting data in Pandas

Let's review what we learnt in this section

4 questions
+ Plotting
3 lectures 23:26

We review how to install Matplotlib and we provide a general introduction

Setting up Matplotlib

Creating line plots

Line plots

Producing bar plots and stacked bar plots

Bar plots
+ Linear regression in Statsmodels
3 lectures 49:49

Fundamental ideas behind linear regression

Preview 19:53

Running a linear regression model in Statsmodels. Analyzing the results. Selecting the proper parameters

Linear Regression: Part1

Working with dummy variables in StatsModels. Ensuring that the model is full rank and the eigenvalues of the design matrix look good

Linear regression: Part2
+ Time Series in Statsmodels
4 lectures 01:09:24

Basic ideas behind ARIMA modelling. Why we need stationary series. How we can decompose a stationary series into the sum of AR and MA terms.

Intro to time series

Identifying the AR and MA order of the GDP of the United States series by inspecting the ACF and PACF. Ensuring that the model is stationary by using the ADF test.

Forecasting the US GDP: Part1

Building an actual model for the GDP of the US. The essential ARIMA() parameters. Making sure that the residuals are valid, and making the predictions for the next quarters

Forecasting the US GDP: Part2

Forecasting the prices of London new houses using the techniques that we learnt in the previous lectures.

Forecasting London property prices
9 questions
+ Introduction to machine learning
2 lectures 09:06

Fundamental review of some ML concepts. What is ML? How does it compare to traditional statistics? Distinction between supervised and unsupervised problems

Introduction to machine learning

Installing Scikit-learn and Numpy+Mkl+ Scipy

Installing scikit-learn
+ Machine learning with Scikit-learn: Supervised problems
9 lectures 02:23:32

We review briefly what are the bayesian ideas behind Naive Bayes. We then explain how we can use the bernoulli bayes or the multinomial one depending on the assumptions we make on the data

Naive Bayes - Bernoulli - Multinomial

We use Bernoulli and Multinomial Naive Bayes classifiers to predict spam in a real SMS dataset from Kaggle. We finally achieve a 96% accuracy (in sample) vs 86% that we would have obtained by using the proportion of non-spam/total sms. This probably gives a good reason for spammers to hate machine learning!

Preview 19:55

We introduce SVM within a very simple (linear) context. Even though it is an extremely powerful algorithm, it will tend to generate too many support vectors, possibly over-fitting the data. Is there a solution to that? Even though SVM is famous as a classification tool, we will see how it can be used as a very powerful regression tool

Linear support Vector machines SVM (SVM and LinearSVC)

We show how to run a linear regression model via ordinary least squares, lasso, and ridge. We see how we LASSO can reduce the dimensionality of a feature set, and how Ridge can estimate using a correlated feature set. At the end, we also end up with models with bias, but that can generate more stable predictions. In the example analyzed here, we end up with all models having a very similar "score", so we can't conclude that either one is "better" than another in terms of prediction. But we show how LASSO can generate a model that competes really well with Ridge and OLS, even with high correlation; and at the same time reduce correctly the dimensionality of the problem. We also how to use the "LASSOCV" and "RIDGECV" functions which automatically compute the regularization parameter we need for those methods, even though in this case we can't get a specific improvement.

Lasso - Ridge

We review the tree functions available in scikit-learn, both for classification and regression

Decision Trees

The best performing methods nowadays rely on building smaller models and then averaging (or choosing one) between them. Many of the winning algorithms in Kaggle competitions do exactly this. We describe the two big families of ensemble methods: (A) - Averaging ensemble methods (B) - Boosting ensemble methods

Introduction to ensemble methods

We introduce one of the very best functions in scikit-learn: ensemble.BaggingClassifier. It allows us to plug any estimator into an ensemble family, reducing the bias in our estimator, and performing much better in out-of-sample scenarios.

Averaging ensemble methods: Part 1: Bagging

Because trees are used frequently in an ensemble context, scikit-learn has specific functions to deal with this. We focus on ExtraTreesClassifier, ExtraTreesRegressor and RandomForestClassifier + RandomForestRegressor

Averaging ensemble methods: Part 2: Random forests

Boosting is a process of generating simple classifiers and then improving them. We focus on Adaboost, a simple idea, with very solid results for image processing, text classification, and general ML.

Boosting ensemble methods
+ Machine learning with Scikit-learn: Unsupervised problems
4 lectures 01:01:13

In ML, we typically deal with hundreds (if not thousands of features), and for many reasons (either for plotting, modelling, identifying rare observations) we will need to reduce that set. We show how to use scikit-learn to compute PCA, and later project that same data into a low-dimensional space. After that, we plot that data, understand which features move in similar directions, which features have high loadings into the principal components, and even identify weird observations.

Principal components

When we observe M observations that we want to group into L groups, there is no easiest way than K-Means. We review how to use it in scikit-learn, and show when it does not perform as expected


We review the theory behind the best clustering algorithm nowadays. How it estimates the density and when it considers a point to be an outlier. We review some tuning strategies for its parameters


We use a dataset containing information on multiple human development indexes, to cluster the countries into 3 groups. We show that both K-means and PCA+K-Means (with one principal component extracted) achieve practically the same results. We finally report the results per cluster and present some insights

Clustering and PCA on real countries data from Kaggle
+ Processing sound and identifying words in Audio
2 lectures 32:31

We have multiple recordings per word: "Banana", "Chair", "IceCream", "Hello",  "Goodbye". We want to extract some metrics from each file, so we can do machine learning later. The difficult part is that the metrics that we need are related to the signal encoded in each file (audio file actually). Luckily, we can leverage an existing R package that reads .wav files, and outputs many properties about the frequencies operating in each file. At the end, we produce 2 csv files (one for training and one for testing) containing 21 features that we can use later for doing machine learning. The approach presented here, can be extended to situations requiring the classification of any sound.

Reading WAV files and extracting features

We load the features that we extracted before, both for our training and testing datasets. We evaluate the performance of both Adaboost and SVM. Both methods have a practical in sample accuracy of 100%, 80% of cross-validation accuracy, and 80% of out-of-sample accuracy.

Classifying word using Adaboost and SVM
  • Some experience with data science, Python and statistics
  • Being able to code functions, and understand a Python program
  • Understand the basics behind regression, random variables, and classification

This course explores several data science and machine learning techniques that every data science practitioner should be familiar with. Fundamentally, the course pivots over four axis: 

  • Pandas and Matplotlib for working with data
  • Keras for Deep Learning, 
  • Scikit-learn for machine learning
  • Statsmodels for statistics

This course explores the fundamental concepts in these big four topics, and provides the student with an overview of the problems that can be solved nowadays. 

I only focus on the computational and practical implications of these techniques, and it is assumed that the student is partially familiar with Statistics-ML-Data Science - or is willing to complement the techniques presented here with theoretical material. Python programming experience will be absolutely necessary, as we only explain how to define Classes in Python (as we will use them along the course)

The teaching strategy is to briefly explain the theory behind these techniques, show how these techniques work in very simple problems, and finally present the student with some real examples. I believe that these real examples add an enormous value to the student, as it helps understand why these techniques are so used nowadays (because they solve real problems!)

Some examples that we will attack here will be: Forecasting the GDP of the United States, forecasting London new houses prices, identifying squares and triangles in pictures, predicting the value of vehicles using online data, detecting spam on SMS data, and many more!

In a nutshell, this course explains how to:

  • Define classes for storing data in a better way
  • Plotting data
  • Merging, pivoting, subsetting, and grouping data via Pandas
  • Using linear regression via Statsmodels
  • Working with time series/forecasting in Statsmodels
  • Several unsupervised machine learning techniques, such as clustering
  • Several supervised techniques such as random forests, classification trees, Naive Bayes classifiers, etc
  • Define Deep Learning architectures using Keras
  • Design different neural networks such as recurrent neural networks, multi-layer perceptrons,etc.
  • Classify Audio/sounds in a similar way that Alexa, Siri and Cortana do using machine learning

The student needs to be familiar with statistics, Python and some machine learning concepts

Who this course is for:
  • Data science beginners, and intermediate users
  • Statisticians, and CS students wanting to strengthen their data science skills