Machine learning is increasingly pervasive in the modern data-driven world. It is used extensively across many fields such as search engines, robotics, self-driving cars, and more.
With this course, you will learn how to perform various machine learning tasks in different environments. We’ll start by exploring a range of real-life scenarios where machine learning can be used, and look at various building blocks. Throughout the course, you’ll use a wide variety of machine learning algorithms to solve real-world problems and use Python to implement these algorithms.
You’ll discover how to deal with various types of data and explore the differences between machine learning paradigms such as supervised and unsupervised learning. We also cover a range of regression techniques, classification algorithms, predictive modelling, data visualization techniques, recommendation engines, and more with the help of real-world examples.
About The Author
Prateek Joshi is an Artificial Intelligence researcher and a published author. He has over 8 years of experience in this field with a primary focus on content-based analysis and deep learning. He has written two books on Computer Vision and Machine Learning. His work in this field has resulted in multiple patents, tech demos, and research papers at major IEEE conferences.
His blog has been visited in more than 200 countries and has received more than a million page views. He has been featured as a guest author in prominent tech magazines. He enjoys blogging about topics such as artificial intelligence, Python programming, abstract mathematics, and cryptography.
He has won many hackathons utilizing a wide variety of technologies. He is an avid coder who is passionate about building game-changing products. He graduated from the University of Southern California and he has worked at companies such as Nvidia, Microsoft Research, Qualcomm, and a couple of early stage start-ups in Silicon Valley.
Machine learning algorithms need processed data for operation. Let’s explore how to process raw data in this video.
Algorithms need data in numerical form to use them directly. But we often label data with words. So, let’s see how we transform word labels into numerical form.
Linear regression uses a linear combination of input variables to estimate the underlying function that governs the mapping from input to output. Our aim would be to identify that relationship between input data and output data.
There are some cases where there is difference between actual values and values predicted by regressor. We need to keep a check on its accuracy. This video will enable us to do that.
Linear regressors tend to be inaccurate sometimes, as outliers disrupt the model. We need to regularize this. We will see that in this video.
Linear model fails to capture the natural curve of datapoints, which makes it quite inaccurate. So, let’s go through polynomial regressor to see how we can improve that.
Applying regression concepts to solve real-world problems can be quite tricky. We will explore how to do it successfully.
We don’t really have an idea on which feature contributes to the output and which doesn’t. It becomes critical to know that, in case we’ve to omit one. This video will help you compute their relative importance.
There might be some problems where the basic regression methods we’ve learned won’t help. One such problem is bicycle demand distribution. You will see how to solve that here.
Evaluating the accuracy of a classifier is an important step in the world of machine learning. We need to learn how to use the available data to get an idea as to how this model will perform in the real world. This is what we are going to learn in this section.
Despite the word regression being present in the name, logistic regression is actually used for classification purposes. Given a set of datapoints, our goal is to build a model that can draw linear boundaries between our classes. It extracts these boundaries by solving a set of equations derived from the training data.
Bayes’ Theorem, which has been widely used in probability to determine the outcome of an event, enables us to classify the data in a smarter way. Let us use its concept to make our classifier more amazing.
While working with data, splitting data correctly and logically is an important task. Let’s see how we can achieve this in Python.
In order to make splitting of dataset more robust, we repeat the process of splitting with different subsets. If we just fine-tune it for a particular subset, we may end up over fitting the model, which may fail to perform well on unknown data. Cross validation ensures accuracy in such a situation.
When we want to fine-tune our algorithms, we need to understand how the data gets misclassified before we make these changes. Some classes are worse than others, and the confusion matrix will help us understand this.
Let's see how we can apply classification techniques to a real-world problem. We will use a dataset that contains some details about cars, such as number of doors, boot space, maintenance costs, and so on, to analyze this problem.
Let’s see how the performance gets affected as we change the hyperparameters. This is where validation curves come into the picture. These curves help us understand how each hyperparameter influences the training score.
Learning curves help us understand how the size of our training dataset influences the machine learning model. This is very useful when you have to deal with computational constraints. Let's go ahead and plot the learning curves by varying the size of our training dataset.
Let’s see how we can build a classifier to estimate the income bracket of a person based on 14 attributes.
Building regressors and classifiers can be a bit tedious. Supervised learning models like SVM help us to a great extent. Let’s see how we can work with SVM.
There are various kernels used to build nonlinear classifiers. Let’s explore some of them and see how we can build a nonlinear classifier.
A classifier often gets biased when there are more datapoints in a certain class. This can turn out to be a big problem. We need a mechanism to deal with this. Let’s explore how we can do that.
Let’s explore how we can train SVM to compute the output confidence level of a new datapoint when it is classified into a known category.
It’s critical to evaluate the performance of a classifier. We need certain hyper parameters to do so. Let’s explore how to find those parameters.
Now that we’ve learned the concepts of SVM thoroughly, let’s see if we can apply them to real-world problems.
We’ve already used SVM as a classifier to predict events. Let’s explore whether or not we can use it as a regressor for estimating traffic.
The k-means algorithm is one of the most popular clustering algorithms, which is used to divide the input data into k subgroups using various attributes of the data. Let’s see how we can implement it in Python for Clustering data.
Vector quantization is popularly used in image compression, where we store each pixel using fewer bits than the original image to achieve compression.
The Mean Shift is a powerful unsupervised learning algorithm that's used to cluster datapoints. It considers the distribution of datapoints as a probabilitydensity function and tries to find the modes in the feature space. Let’s see how to use it in Python.
Many a times, we need to segregate data and group them for the purpose of analysis and much more. We can achieve this in Python using theagglomerative clustering. Let’s see how we can do it.
In supervised learning, we just compare the predicted values with the original labels to compute their accuracy. In unsupervised learning, we don't have any labels. Therefore, we need a way to measure the performance of our algorithms. Let’s see how we could evaluate their performance.
Wouldn't it be nice if there were a method that can just tell us the number of clusters in our data? This is where Density-Based Spatial Clustering of Applications with Noise (DBSCAN) comes into the picture. Let us see how we can work with it.
How will we operate with the assumption that we don't know how many clusters there are. As we don't know the number of clusters, we can use an algorithm called Affinity Propagation to cluster. Let's see how we can use unsupervised learning for stock market analysis with this.
What could we do when wedon't have labeled data available all the time but it's important to segment the market so that people can target individual groups? Let’s learn to build a customer segmentation model for this situation.
One of the major parts of any machine learning system is the data processing pipeline. Instead of calling functions in a nested way, it's better to use the functional programming paradigm to build the combination. Let's take a look at how to combine functions to form a reusable function composition.
The scikit-learn library has provisions to build machine learning pipelines. We just need to specify the functions, and it will build a composed object that makes the data go through the whole pipeline. Let’s see how to build it in Python.
While working with the training dataset, we need to make a decision based on the number of nearest neighbors in it. This can be achieved with the help of the NearestNeighbor method in Python. Let’s see how to do it.
When we want to find the class to which an unknown point belongs, we find the k-nearest neighbors and take a majority vote. Let's take a look at how to construct this.
A good thing about the k-nearest neighbors algorithm is that it can also be used as a regessor. Let’s see how to do this!
In order to find users in the database who are similar to a given user we need to define a similarity metric. Euclidean distance score is one such metric that we can use to compute the distance between data points. Let’s look at this in more detail in this video.
The Euclidean distance score is a good metric, but it has some shortcomings. Hence, Pearson correlation score is frequently used in recommendation engines. Let's see how to compute it.
One of the most important tasks in building a recommendation engine is finding users that are similar. Let's see how to do this in this video.
Now that we’ve built all the different parts of a recommendation engine, we are ready to generate movie recommendations. Let’s see how to do that in this video.
With tokenization we can define our own conditions to divide the input text into meaningful tokens. This gives us the solution for dividing a chunk of text into words or into sentences. Let's take a look at how to do this.
During text analysis, it's useful to extract the base form of the words to extract some statistics to analyze the overall text. This can be achieved with stemming, which uses a heuristic process to cut off the ends of words. Let's see how to do this in Python.
Sometimes the base words that we obtained using stemmers don't really make sense. Lemmatization solves this problem by doing things using a vocabulary and morphological analysis of words and removes inflectional word endings. Let's take a look at how to do this in this video.
When you deal with a really large text document, you need to divide it into chunks for further analysis. In this video, we will divide the input text into a number of pieces, where each piece has a fixed number of words.
When we deal with text documents that contain millions of words, we need to convert them into some kind of numeric representation so as to make them usable for machine learning algorithms. A bag-of- words model is what helps us achieve this task quite easily.
The goal of text classification is to categorize text documents into different classes. This is an extremely important analysis technique in NLP. Let us see how we can build a text classifier for this purpose.
Identifying the gender of a name is an interesting task in NLP. Also gender recognition is a part of many artificial intelligence technologies. Let us see how to identify gender in Python.
How could we discover the feelings or sentiments of different people about a particular topic? This video helps us to analyze that.
With topic modeling, we can uncover some hidden thematic structure in a collection of documents. This will help us in organizing our documents in a better way so that we can use them for analysis. Let’s see how we can do it!
Reading an audio file and visualizing the signal is a good starting point that gives us a good understanding of the basic structure of audio signals. So let us see in this video how we could do it!
Audio signals consist of a complex mixture of sine waves of different frequencies, amplitudes and phases. There is a lot of information that is hidden in the frequency content of an audio signal. So it’s necessary to transform the audio signal into a frequency domain. Let’s see how to do this.
We can use NumPy to generate audio signals. As we know, audio signals are complex mixtures of sinusoids. Let’s see how we can generate audio signals with custom parameters.
Music has been explored since centuries and technology has set new horizons to play with it. We can also create music notes in Python. Let’s see how we can do this.
When we deal with signals and we want to use them as input data and perform analysis, we need to convert them into frequency domain. So, let’s get hands-on with it!
A hidden Markov Model represents probability distributions over sequences of observations. It allows you to find the hidden states so that you can model the signal. Let us explore how we can use it to perform speech recognition.
This video will walk you through building a speech recognizer by using the audio files in a database. We will use seven different words, where each word has 15 audio files. Let’s go ahead and do it!
Let’s understand how to convert a sequence of observations into time series data and visualize it. We will use a library called pandas to analyze time series data. At the end of this video, you will be able to transform data into the time series format.
Extracting information from various intervals in time series data and using dates to handle subsets of our data are important tasks in data mining. Let’s see how we can slice time series data using Python.
You can filter the data in many different ways. The pandas library allows you to operate on time series data in any way that you want. Let's see how to operate on time series data.
One of the main reasons that we want to analyze time series data is to extract interesting statistics from it. This provides a lot of information regarding the nature of the data. Let’s see how to extract these stats.
Hidden Markov Models are really powerful when it comes to sequential data analysis. They are used extensively in finance, speech analysis, weather forecasting, sequencing of words, and so on. We are often interested in uncovering hidden patterns that appear over time. Let’s see how we can use it.
The Conditional Random Fields (CRFs) are probabilistic models used to analyze structured data and also to label and segment sequential data. Let us see how we can use it to work on our input dataset!
This video will get you hands-on with analyzing stock market data and understanding the fluctuations in the stocks of different companies. So let’s see how to do this!
OpenCV is the world's most popular library for computer vision. It enables us to analyze images and do a lot of stuff with it. Let’s see how to operate it!
When working with images, it is essential to detect the edges to process the image and perform different operations with it. Let’s see how to detect edges of the input image in Python.
The human eye likes contrast! This is the reason that almost all camera systems use histogram equalization to make images look nice. This video will walk you through the use of histogram equalization in Pyhton.
One of the essential steps in image analysis is to identify and extract the salient features for the purpose of computer vision. This can be achieved with a corner detection technique and SIFT feature point in Python. This video will enable you to achieve this goal!
When we build object recognition systems, we may want to use a different feature detector before we extract features using SIFT; that will give us the flexibility to cascade different blocks to get the best possible performance. Let’s see how to do it with Star feature detector.
Have you ever wondered how you could build image signatures? If yes, this video will take you through creating features by using visual codebook, which will enable you to achieve this goal. So, let’s dive in and watch it!
We can construct a bunch of decision trees that are based on our image signatures, and then train the forest to make the right decision. Extremely Random Forests (ERFs) are used extensively for this purpose. Let’s dive in and see how to do it!
While dealing with images, we tend to tackle problems with the contents of unknown images. This video will enable you to build an object recognizer which allows you to recognize the content of unknown images. So, let’s see it!
Webcams are widely used for real-time communications and for biometric data analysis. This video will walk you through capturing and processing video from your webcam.
Haar cascade extracts a large number of simple features from the image at multiple scales. The simple features are basically edge, line, and rectangle features that are very easy to compute. It is then trained by creating a cascade of simple classifiers. Let’s see how we can detect a face with it!
The Haar cascades method can be extended to detect all types of objects. Let's see how to use it to detect the eyes and nose in the input video.
Principal Components Analysis (PCA) is a dimensionality reduction technique that's used very frequently in computer vision and machine learning. It’s used to reduce the dimensionality of the data before we can train a system. This video will take you through the use of PCA.
What if you need to reduce the number of dimensions in unorganized data? PCA, which we used in the last video, is inefficient in such situations. Let us see how we can tackle this situation.
When we work with data or signals, they are generally received in a raw form. Or rather we can say they are a mixture of some unwanted stuff. It is essential for us to segregate them, so as to work on these signals. This video will enable you to achieve this goal.
We are now finally ready to build a face recognizer! Let’s see how to do it!
Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.
With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.
From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.
Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.