Machine learning with Scikit-learn
3.0 (25 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
256 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Machine learning with Scikit-learn to your Wishlist.

Add to Wishlist

Machine learning with Scikit-learn

Learn the most important machine learning techniques using the best machine learning library available
3.0 (25 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
256 students enrolled
Created by Francisco Juretig
Last updated 3/2017
Current price: $12 Original price: $50 Discount: 76% off
3 days left at this price!
30-Day Money-Back Guarantee
  • 6.5 hours on-demand video
  • 25 Supplemental Resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion

Training 5 or more people?

Get your team access to Udemy's top 2,000 courses anytime, anywhere.

Try Udemy for Business
What Will I Learn?
  • Load data into scikit-learn; Run many machine learning algorithms both for unsupervised and supervised data.
  • Assess model accuracy and performance
  • Being able to decide what's the best model for every scenario
View Curriculum
  • Some Python and statistics knowledge is required: Being able to code loops, functions, classes in Python is necessary. Understanding what are random variables, what is a Gaussian distribution, and the underlying concepts behind linear regression are necessary as well.

This course will explain how to use scikit-learn to do advanced machine learning. If you are aiming to work as a professional data scientist, you need to master scikit-learn!

It is expected that you have some familiarity with statistics, and python programming. It's not necessary to be an expert, but you should be able to understand what is a Gaussian distribution, code loops and functions in Python, and know the basics of a maximum likelihood estimator. The course will be entirely focused on the python implementation, and the math behind it will be omitted as much as possible.

The objective of this course is to provide you with a good understanding of scikit-learn (being able to identify which technique you can use for a particular problem). If you follow this course, you should be able to handle quite well a machine learning interview. Even though in that case you will need to study the math with more detail.

We'll start by explaining what is the machine learning problem, methodology and terminology. We'll explain what are the differences between AI, machine learning (ML), statistics, and data mining. Scikit-learn (being a Python library) benefits from Python's spectacular simplicity and power. We'll start by explaining how to install scikit-learn and its dependencies. And then show how can we can use Pandas data in scikit-learn, and also benefit from SciPy and Numpy. We'll then show how to create synthetic data-sets using scikit-learn. We will be able to create data-sets specifically tailored for regression, classification and clustering.

In essence, machine learning can be divided into two big groups: supervised and unsupervised learning. In supervised learning we will have an objective variable (which can be continuous or categorical) and we want to use certain features to predict it. Scikit-learn will provide estimators for both classification and regression problems. We will start by discussing the simplest classifier which is "Naive Bayes". We will then see some powerful regression techniques that via a special trick called regularization, will help get much better linear estimators. We will then analyze Support Vector Machines, a powerful technique for both regression and classification. We will then use classification and regression trees to estimate very complex models. We will see how we can combine many of the existing estimators into simpler structures, but more robust for out of sample performance, called "ensemble" methods. In particular random forests, random trees, and boosting methods. These methods are the ones winning most data science competitions nowadays.

We will see how we can use all these techniques for online data, image classification, sales data, and more. We also use real datasets from Kaggle such as spam SMS data, house prices in the United States, etc. to teach the student what to expect when working with real data.

On the other hand, in unsupervised learning we will have a set of features (but with no outcome or target variable) and we will attempt to learn from that data. Whether it has outliers, whether it can be grouped into groups, whether we can remove some of those features, etcetera. For example we will see k-means which is the simplest algorithm for classifying observations into groups. We will see that sometimes there are better techniques such as DBSCAN. We will then explain how we can use principal components to reduce the dimensionality of a data-set. And we will
use some very powerful scikit-learn functions that learn the density of the data, and are able to classify outliers.

I try to keep this course as updated as possible, specially since scikit-learn is constantly being updated. For example, neural networks was added in the latest release. I tried to keep the examples as simple as possible, keeping the amount of observations (samples) and features (variables) as small as possible. In real situations, we will use hundreds of features and thousands of samples, and most of the methods presented here scale really well into those scenarios. I don't want this course to be focused on very realistic examples, because I think it obscures what we are trying to achieve in each example. Nevertheless, some more complex examples will be added as additional exercises.


Who is the target audience?
  • Students with some analytics/data-science knowledge aiming at being able to comfortable model in scikit-learn
  • Experienced data scientists working in R/SAS/MATLAB, wanting to transition into ML with Python
Compare to Other Machine Learning Courses
Curriculum For This Course
27 Lectures
Introduction to Scikit-learn
4 Lectures 49:11

How to install Numpy, Scipy and scikit-learn. Making sure it works

Installing scikit-learn

Basic scikit-learn concepts and terminology. How to load data externally via Pandas. Some useful standarization functions that scikit-learn provides

Data manipulation: from Pandas to scikit-learn

How we can use scikit-learn to create data for clustering problems, regression problems and classification problems

Creating synthetic data
Supervised methods
16 Lectures 04:01:49

We review briefly what are the bayesian ideas behind Naive Bayes. We then explain how we can use the bernoulli bayes or the multinomial one depending on the assumptions we make on the data

Naive Bayes : Bernoulli - Multinomial

We use a real SMS spam dataset from Kaggle in order to test Bernoulli and Multinomial Naive classifiers. We end up achieving 95% and 96% accuracy using cross validation (vs 86% accuracy that we would have obtained if we had used the proportion of non-spam emails / total emails). Now you know why spammers hate machine learning practitioners!

Preview 19:55

We introduce SVM within a very simple (linear) context. Even though it is an extremely powerful algorithm, it will tend to generate too many support vectors, possibly over-fitting the data. Is there a solution to that? Even though SVM is famous as a classification tool, we will see how it can be used as a very powerful regression tool

Linear Support Vector Machines (SVM): SVM and LinearSVC

In the previous lesson we presented SVM and showed that we can't control the number of support vectors directly. An alternative formulation (NuSVM) will allow us to do exactly that

Linear Support Vector Machines (SVM): NuSVM

2 questions

The most famous (and simplest) neural network is used a lot to predict categorical outcomes. Such as whether an observation belongs to group A or B. And it does have a nice thing: we can draw conclusions on the parameters. We explore linear_model.LogisticRegression in scikit-learn

Logistic regression

We use a logistic regression model to predict if the income of several people will be greater than 50K using Census data from the US. We show how L1 and L2 regularization methods work, and we finally present a dataframe containing the coefficient values and coefficient names. This is certainly a nice feature from logistic regression (being able to assign a meaning to each coefficient - the sign of each logistic coefficient tells us if they increase the probability of observing a 0 or 1), which is not shared by many methods

Predicting if income >50k using real US Census Data

Isotonic regression is a very useful method when the sign of the relationship between two variables is known. It can be easily implemented in scikit-learn. We review isotonic.isotonic_regression in scikit-learn with a price/sales example

Isotonic regression

We show how to run a linear regression model via ordinary least squares, lasso, and ridge. We see how we LASSO can reduce the dimensionality of a feature set, and how Ridge can estimate using a correlated feature set. At the end, we also end up with models with bias, but that can generate more stable predictions. In the example analyzed here, we end up with all models having a very similar "score", so we can't conclude that either one is "better" than another in terms of prediction. But we show how LASSO can generate a model that competes really well with Ridge and OLS, even with high correlation; and at the same time reduce correctly the dimensionality of the problem. We also how to use the "LASSOCV" and "RIDGECV" functions which automatically compute the regularization parameter we need for those methods, even though in this case we can't get a specific improvement.

Linear regression - Lasso - Ridge

Lasso - Ridge
2 questions

We review the tree functions available in scikit-learn, both for classification and regression

Decision trees

The best performing methods nowadays rely on building smaller models and then averaging (or choosing one) between them. Many of the winning algorithms in Kaggle competitions do exactly this. We describe the two big families of ensemble methods: (A) - Averaging ensemble methods (B) - Boosting ensemble methods

Introduction to ensemble methods

We introduce one of the very best functions in scikit-learn: ensemble.BaggingClassifier. It allows us to plug any estimator into an ensemble family, reducing the bias in our estimator, and performing much better in out-of-sample scenarios.

Averaging ensemble methods - Part 1: Bagging

Because trees are used frequently in an ensemble context, scikit-learn has specific functions to deal with this. We focus on ExtraTreesClassifier, ExtraTreesRegressor and RandomForestClassifier + RandomForestRegressor

Averaging ensemble methods - Part 2: Random forests

We practice how to encode the simplest image classification problem into the format we need in scikit-learn. We see that even though we have few pixels, and few samples, we can predict quite well whether an image is an "I" or a "C" using random forests

Preview 20:00

Boosting is a process of generating simple classifiers and then improving them. We focus on Adaboost, a simple idea, with very solid results for image processing, text classification, and general ML.

Boosting ensemble methods

We show how to use the fantastic GridsearchCV function. It allows us to get the best parameters for any model using cross validation. We explain how to use it with random forests

Grid Search Cross Validation

We use a real dataset containing house prices for the US. We use several features to predict those prices, and we determine some of the parameters using cross validation. We end up with an ExtraTrees classifier having a 82% accuracy.

Predicting real house prices in the US using ExtraTreesRegressor
Unsupervised methods
7 Lectures 01:37:58

When we want to visualise the shape of  uni-dimensional data, histograms are the best tool. But what happens when we want to generate a smoother version of it? Scikit-learn provides some density estimation methods, ideal for this. In this example we see a weird example of data truncated between 0-1, where density estimation can be estimated, but not before applying a trick.

Density Estimation

In ML, we typically deal with hundreds (if not thousands of features), and for many reasons (either for plotting, modelling, identifying rare observations) we will need to reduce that set. We show how to use scikit-learn to compute PCA, and later project that same data into a low-dimensional space. After that, we plot that data, understand which features move in similar directions, which features have high loadings into the principal components, and even identify weird observations.

Principal Components

Principal Components
2 questions

When we observe M observations that we want to group into L groups, there is no easiest way than K-Means. We review how to use it in scikit-learn, and show when it does not perform as expected

Preview 10:47

We review the theory behind the best clustering algorithm nowadays. How it estimates the density and when it considers a point to be an outlier. We review some tuning strategies for its parameters


Kmeans and DBScan comparison

3 questions

We use a dataset containing information on multiple human development indexes, to cluster the countries into 3 groups. We show that both K-means and PCA+K-Means (with one principal component extracted) achieve practically the same results. We finally report the results per cluster and present some insights

Clustering and PCA on real countries data from Kaggle

Assume you have data containing certain proportion of outliers (abnormal observations). Is there a robust way of identifying them? Can that be used to predict more abnormal observations?

Outlier detection

Assume you have data not containing outliers, but want to predict whether a new set of observations share that same data structure, or they are outliers (belonging possibly to another distribution). We show how to use the one class SVM to estimate the data density and classify the new set of observations.

Novelty detection
About the Instructor
Francisco Juretig
3.8 Average rating
154 Reviews
1,355 Students
8 Courses

I worked for 7+ years exp as statistical programmer in the industry. Expert in programming, statistics, data science, statistical algorithms. I have wide experience in many programming languages. Regular contributor to the R community, with 3 published packages. I also am expert SAS programmer. Contributor to scientific statistical journals. Latest publication on the Journal of Statistical Software.