Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Introduction to Machine Learning with Scikit-Learn
Highest Rated
Rating: 4.5 out of 5(452 ratings)
2,184 students

Introduction to Machine Learning with Scikit-Learn

Learn the three main techniques of machine learning: regression, classification and clustering, using Scikit-Learn
Last updated 6/2021
English

What you'll learn

  • In this course you will learn: Machine Learning and Scikit-Learn
  • You will be able to recognize problems that can be solved with Machine Learning
  • Select the right technique (is it a classification problem? a regression? needs preprocessing?)
  • Train and evaluate regression models with Scikit-Learn to forecast numerical quantities.
  • Train and evaluate classification models with Scikit-Learn to predict categories.
  • Use clustering techniques to group your data and discover insights.

Course content

8 sections62 lectures2h 32m total length
  • Introduction5:59
  • Machine Learning and AI3:19

    Transcript:

    Let's get started. This is a course on machine learning. We're going to be immersed in machine learning and we are going to learn a lot of things. So first of all, let's contextualize machine learning in the world of artificial intelligence. Artificial intelligence and machine learning are not new disciplines.


    Machine learning is a branch of artificial intelligence that concerns itself with learning from large quantities of data. Deep learning, which regards neural networks, is a subset of machine learning, which uses a type of model called neural network to learn from data. What is machine learning? A definition that dates back from the very early days of machine learning is a field of study that gives computers the ability to learn without being explicitly programmed.


    This is referring to using data as an example for the computer to learn from. A much more recent definition of her that the conference last year is: it's a new way of computer programming. And this is really the paradigm shift that we're assisting to, in these days. More and more so the systems and softwares we use have components that are not explicitly programmed, but are learned from a large amount of data.


    And we see this in many different applications. For example,  if you've ever bought a book on Amazon, when Amazon recommends you other products to buy, that's an example of machine learning applied to a web interface. You're buying  a book and the computer has learned from your purchase history, as well as other customers' purchase histories, what could be the things you are interested in buying next.


    Another very common application is for example, sentiment analysis, which is used a lot by financial traders to gauge the sentiment of markets and then trade stock. This regards analyzing large quantities of text and for each of these pieces of texts, deducting or extracting what the sentiment associated is, so is it a positive or negative sentiment. Another domain where sentiment analysis is used a lot is customer service, where, when customers leave reviews about products or about your services, you can analyze if they're happy or not happy and therefore correct your actions.


    Two more examples very commonly known are image classifications and machine translation is probably the most common example of machine learning applied to text. Where you have input is a sentence in one language, and the output is a sentence in another language.


    There are many other applications predicting price of houses, intrusion, fraud detection, classifying documents, analyzing logs of applications or social media and machine learning is applied increasingly more frequently by companies and other institutions in many, many different domains.

  • Machine Learning Enablers1:26

    Transcript:

    What are the enablers of machine learning? So machine learning is fundamentally enabled by two big revolutions. One is data. So the techniques in machine learning are not new. For sure we have new techniques being used recently, but overall they are not new in total. A lot of the material and tooling we are using dates back to decades, if not half a century. However, what's new and recent and only happened in the last 10 years or so is the massive amount of data we've been collecting from a number of sources, including phones, applications, sensors, cameras, microphones, logs, et cetera.


    As well as the compute power that's been increasingly available for us in forms of cloud computing. So cloud compute can be CPUs, GPUs or even TPUs, which are specific chips for machine learning. And these two things combined, the large amount of data together with the large amount of compute power has enabled modern machine learning on a variety of different data types, including tabular data, time series, data documents, images, sound, and video.

  • The 3 main techniques in machine learning4:31

    Transcript:

    In the previous class, we talked about machine learning and saw that machine learning is a branch of artificial intelligence that is concerned with learning from data. And in particular, what we said is that it also can be considered a way of computer programming.


    Machine learning systems are systems that learn from data, and therefore, improve their performance as we feed them with more and more data. So there is a relationship between the amount of data available and the performance of the machine learning system. Machine learning nowadays is quite well-developed field, but there are three techniques that are the most popular and also the most widely used in industry.


    And even if you're a complete novice to machine learning, I want to ensure you that you are already familiar with all three of them. And that's because you have a brain and your brain is an amazing pattern recognizer. So I'm going to go over the three main techniques and show you that your brain is capable of recognizing what the patterns are that we're asking the machine to recognize.


    For example, if you look at this chart, what is the pattern that comes immediately to mind? This is a chart displaying humans, so each human is represented by a dot and for each human, we know the height and the weight. And the immediately visible pattern is there is a correlation between height and weight.


    Okay. So taller people are on average also heavier and we can represent that with a straight line going upwards to the right. And that's what we call the line of best fit. And I'm sure you're all very familiar with this. This is an example of a regression and it's called a regression because the output space, in this case, the weight is a continuous variable. So there are numbers. We are predicting the numbers. So we apply a regression technique.


    On the other hand, do you see any pattern in this figure? What is represented here? And, are there any patterns? So what you can see here is, internet service providers. Okay. Each dot represents one ISP and they're characterized by two variables or features the download and upload speed.


    And there is a third element, which is the shape of the symbol or the color that kind of groups them into two classes. One is the class of fast ISPs, and one is a class of slow ISPs. So if we were to separate them or look for a boundary that would separate them, we would probably draw a line like this, or a similar line that crosses and separates the fast from the slow.


    So notice that we're still predicting something here, but differently from the previous case, now we were predicting a categorical quantity. Okay. We're predicting a quantity that is discrete in nature. It's either fast or slow.


    The third technique that we're looking at is represented in this chart. Do you see any pattern? And here, typically people say there are two groups, each dot is a flower. We know two features of these flowers, this sepal length and the petal length. But there are clearly two groups of distinct behavior. One group at the bottom of flowers, whose petal length is more or less static or similar, and another group circled in green here where the petal length and sepal length are correlated. So these three techniques go by the names of regression, classification, and clustering, and they account for 90% of machine learning commercially used and applied in everyday products. So these are the three main techniques and are the three techniques that we will cover in this and the next two lectures.

  • Types of machine learning2:59

    Transcript:

    I want to zoom out a little and talk about different types of machine learning, because there are roughly speaking three families of machine learning. There are actually a few others, but the common ones are three. The most common by far is supervised learning of which we've just seen two examples.


    This is when you task a machine to learn from data in the forms of input, output pairs. So examples with a ground truth answer, and classification and regression are the two main techniques used in the realm of supervised learning, when you want the machine to learn from example, practical applications of these are things like spam detection, fraud detection, image recognition, forecasting, future values, time series, and so on. So anything where you're taking data, trying to predict a given output, is called supervised learning.


    And this is in contrast with a technique called unsupervised learning, which is when your goal is not to learn from examples, but your goal is to find relationships from the data and represent it in a way that offers you a deeper meaning in the data itself.


    So the most commonly used technique here is that of clustering where you're aggregating data, based on similarity. Okay. Practical applications of this customer segmentation, log analysis, discovery of new diseases, and so on.


    A third type of learning, that's increasingly more relevant, especially for companies that have lots and lots of data is reinforcement learning.


    This is when, instead of giving the machine static pairs of input outputs, like in supervised learning, we offer the machine an environment in which the machine can control an agent. And this environment/agent pair allows the machine to learn by trial and error. So the agent can do actions in a certain environment, and the environment will reward the agent with punishment or with premiums, based on the action taken by the agent. This is used in simulation environments.


    For example, video games and the most common application of these is robotic control. So if you want to build a robot that can navigate an environment, for example, a self-driving car, you will apply reinforcement learning in a simulated environment first, and then, once the agent has learned to drive, deployed on a car and hopefully it will not crash. This course is not about reinforcement learning. So we will actually focus for the remainder of the scores, mostly on supervised learning and a little bit on unsupervised learning.

  • Supervised and Unsupervised Learning1:29

    Transcript:

    Supervised learning is when humans provide labels to the machine telling what is the answer that it's supposed to give? And this is typically a column in our data set. So for example, let's say we had two types of images, cats, and dogs, and we wanted to recognize the images, we would have to label each image by telling the machine whether it's an image of a cat or an image of a dog and we would have to do that manually.


    Typically there is a human or a team of humans providing the labels. And the goal here is to generalize from example. So we will evaluate the performance of the machine on how well it can recognize cats and dogs on previously unseen images. Typical examples of these are spam detection, forecasting, any algorithm where you're asking the machine to predict a quantity.


    In contrast unsupervised learning is when you don't have labels and your goal is not to predict something, but to discover something, okay. Understanding your data at a deeper level. For example, here we are applying clustering, which is an unsupervised learning technique to find out that there are four clusters in our dataset of similar points, and one common application of this is customer segmentation.

  • How to choose ML technique1:34

    Transcript:

    So when you're faced with a problem in machine learning, and you're asking yourself, what technique should I use? How do you go about deciding which technique to use? You ask a series of questions? The first question is, do I have labels? am I trying to predict something that I already know what it should be?


    If the answer is yes, then I'm in "supervised learning world". If the answer is no, then I'm in "unsupervised learning world". So let's say we've answered yes, and we are in supervised learning mode. Then the second question we are asking is are these labels categories? Yes or no. If they are categories, then we are performing classification problem.


    If they are not categories, meaning they are numerical, we are performing a regression problem. Vice versa, if we are not in supervised learning and we are in unsupervised learning, the next question we're going to ask is, are we looking for groups? If we are looking for groups, then we're probably doing clustering.


    If on the other hand, we're not looking for groups, that's everything else that is not one of these three techniques. So this is a very quick map of how to select the best technique in machine learning. And we are going to go down the path of supervised learning with numerical labels to talk about a regression problem.

Requirements

  • Previous experience programming in Python is advised
  • Our free Pandas Masterclass can be useful

Description

This course introduces machine learning covering the three main techniques used in industry: regression, classification, and clustering.

It is designed to be self-contained, easy to approach, and fast to assimilate.

You will learn:

  • What machine learning is

  • Where machine learning is used in industry

  • How to recognize the technique you should use

  • How to solve regression problems to predict numerical quantities

  • How to solve classification problems to predict categorical quantities

  • How to use clustering to group your data and discover new insights

The course is designed to maximize the learning experience for everyone and includes 50% theory and 50% hands-on practice. It includes labs with hands-on exercises and solutions.

No software installation required. You can run the code on Google CoLab and get started right away.

This course is the fastest way to get up to speed in machine learning and Scikit Learn.


Why Machine Learning?

Machine Learning has taken the world by a storm in the last 10 years, revolutionizing every company and empowering many applications we use every day.

Here are some examples of where you can find machine learning today: recommender systems, image recognition, sentiment analysis, price prediction, machine translation, and many more!

There are over 3000 job announcements requiring Scikit Learn in the United States alone, and almost 80000 jobs mentioning machine learning in the US. Machine Learning engineers can easily earn six figure salaries in major cities, and companies are investing Billions of dollars in developing their teams.

Even if you already have a job, understanding how machine learning works will empower you to start new projects and get visibility in your company.


Why Scikit Learn?

  • It's the best Python library to learn machine learning

  • Simple, yet powerful API for predictive data analysis

  • Used in many industries: tech, biology, finance, insurance

  • Built on standard libraries such as NumPy, SciPy, and Matplotlib

Who this course is for:

  • Python enthusiasts that want to deepen their knowledge of machine learning
  • Software engineers looking to add machine learning skills to their toolbelt
  • College students looking for hands-on practice in machine learning
  • Data Analysts looking to expand their skills into machine learning