Python: Master Machine Learning with Python: 3-in-1
2.4 (2 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
49 students enrolled

Python: Master Machine Learning with Python: 3-in-1

Practical and unique solutions to common Machine Learning problems that you face!
2.4 (2 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
49 students enrolled
Created by Packt Publishing
Last updated 6/2018
English [Auto]
Current price: $139.99 Original price: $199.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 11 hours on-demand video
  • 1 downloadable resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Evaluate and apply the most effective models to problems
  • Deploy machine learning models using third-party APIs
  • Interact with text data and build models to analyze it
  • Use deep neural networks to build an optical character recognition system
  • Work with image data and build systems for image recognition and biometric face recognition
  • Eliminate common data wrangling problems in Pandas and scikit-learn as well as solve prediction visualization issues with Matplotlib
  • Explore data visualization techniques to interact with your data in diverse ways
Course content
Expand all 144 lectures 10:44:43
+ Python Machine Learning Projects
26 lectures 02:56:22

This video gives an overview of the entire course.

Preview 02:37

We need the air pricing data from a website to work with. You will learn to do that in this section.

  • Get flight explorer and look at an example 
Sourcing Airfare Pricing Data

After determining the source of the data, we need to retrieve the data.

  • Install Phantom JS and the necessary libraries
  • Instantiate the browser object
  • Send the user agent to the receiving server 
Retrieving the Fare Data with Advanced Web Scraping Techniques

DOM is the structure of elements that form the web page. We need to get some details of the structure by parsing it.

  • Feed the page source and retrieve a list of best prices
  • Extract the best price or the cheapest price
  • Identify outliers with clustering techniques 
Parsing the DOM to Extract Pricing Data

To get real-time alerts when a particular event occurs, we need to use IFTTT.

  • Sign up for the Maker channel
  • Create an event fare alert
  • Fill in the message and customize it 
Sending Real-Time Alerts Using IFTTT

To deploy our app, we'll move on to working in a text editor. You will put together the entire code to get the final result.

  • Import packages
  • Create a function that pulls down the data and runs your clustering algorithm
  • Include a scheduler and run the file from the command line 
Putting It All Together

Before deciding strategies for the IPO market, we need to study the IPO market and derive inferences from it.

  • Read about the IPO market
  • Look at the performance of the IPO market
  • Study strategies 
The IPO Market

The consideration and inclusion of all factors affecting the market is called feature engineering. Modeling this is as important as the data used in building the model.

  • Add features. Retrieve data.
  • Tidy up the underwriter data. Add final features.
  • Transform data into matrix form 
Feature Engineering

Instead of giving the value of the return, you can predict the IPO for a trade you will buy or not buy. The model used is logistic regression.

  • Apply logic regression to the data
  • Split data into training and testing datasets. Fit the model.
  • Evaluate the model 
Binary Classification

It is important to know which features will make the offering successful. You can find that out in this section.

  • Examine coefficients for logistic regression.
  • Fit random forest classifier.
  • Evaluate summary 
Feature Importance

To create a model, we have to first have a training dataset. We will use the pocket app for this.

  • Install the pocket chrome extension.
  • Use the pocket API to retrieve stories. 
Creating a Supervised Training Set with the Pocket App

You can't move forward with just the URLs of the stories. You would need the full article. So let's check out how to do that in this video.

  • Sign up for API access.
  • Feed plain text to the model. 
Using the API to Download Story Bodies

Machine learning models work on numerical data. So we will need to transform our text into numerical data using NLP.

  • Convert the corpus into a BOW representation. Remove stop words.
  • Use the tf-idf algorithm. Convert the training set into a tf-idf matrix. 
Natural Language Processing Basics

You will learn about the linear support vector machine in this video. The SVM algorithm separates data points linearly into classes.

  • Feed the tf-idf matrix into the SVM. 
Support Vector Machines

We have provided a training dataset. But we also need a stream of articles as a testing dataset to run our model against.

  • Set up news feeds and Google sheets.
  • Pull down articles using a Python library.
  • Make changes if necessary and rebuild the model. 
IFTTT Integration with Feeds, Google Sheets, and E-mail

It would make life easier if you get a personalized e-mail of your stories, right? So you will learn how to do that in this video.

  • Create a recipe. Receive a web request and create a trigger.
  • Generate a script that will send us articles daily. 
Setting Up Your Daily Personal Newsletter

Research is the most important thing before we start working on designing a strategy.

  • Study the market and understand it.
  • Understand different forms of the market. 
What Does Research Tell Us about the Stock Market?

Once you have studied the various aspects of the market, it is time to develop a trading strategy. You will learn it in this video.

  • Read and plot data.
  • Pull data for various statistical values.
  • Extend the time span. 
Developing a Trading Strategy

Now that we have our baseline, we will build our first regression model for prediction of stocks.

  • Set up a dataframe. Import SVM and set training and testing datasets.
  • Fit the model. Compare with the actual data.
  • Evaluate the performance by adjusting the different parameters till the desired result is met. 
Building a Model and Evaluating Its Performance

Another algorithm to work with is dynamic time warping. It provides us a metric which will inform us about the similarity between two time series.

  • Calculate the distance between two time series.
  • Compare a series against all other series and infer.
  • Evaluate trades. 
Modeling with Dynamic Time Warping

It is very important to understand machine learning's concepts before working with it.

  • Understand basic machine learning
  • Look at similar technologies by Google and Facebook. 
Machine Learning on Images

In order to work with images, we need to transform them into a matrix form, that is, numerical form.

  • Load the MNIST database
  • Get the matrix. Perform operations on it. 
Working with Images

We will use algorithms to find similar images in the database.

  • In cosine similarity algorithm, compute the similarity
  • Take a look at the results.
  • Test with chi-squared algorithm 
Finding Similar Images

We will combine what we have studied so far to build an image similarity engine.

  • Utilize GraphLab Create, and load images
  • Extract deep features and compare them.
  • Show the image 
Building an Image Similarity Engine

Design of chatbots consists of parameters like mode of communication, the content, and so on. You will look at that in this video.

  • Observe Python NLTK
  • Run a few sample questions and response samples 
The Design of Chatbots

Having looked at the working of a chatbot, we will now build a chatbot.

  • Get training dataset. Load data
  • Parse the data into a question-answer form
  • Get similarity scores. Test the chatbot. 
Building a Chatbot
+ Python Machine Learning Solutions
97 lectures 04:31:07

This video gives an overview of the entire course

Preview 06:38

Machine learning algorithms need processed data for operation. Let’s explore how to process raw data in this video.

  • Look at Mean Removal
  • Go through Scaling
  • Learn Data Normalization and Binarization 
Preprocessing Data Using Different Techniques

Algorithms need data in numerical form to use them directly. But we often label data with words. So, let’s see how we transform word labels into numerical form.

  • Import a preprocessing package in a new file
  • Create labels
  • Encode the labels 
Label Encoding

Linear regression uses a linear combination of input variables to estimate the underlying function that governs the mapping from input to output. Our aim would be to identify that relationship between input data and output data.

  • Load the data and label into variables
  • Separate the training dataset and the testing dataset
  • Check the test dataset output 
Building a Linear Regressor

There are some cases where there is difference between actual values and values predicted by regressor. We need to keep a check on its accuracy. This video will enable us to do that.

  • Identify metrics to evaluate regressor
  • Compute the metrics
  • Achieving Model Persistence programmatically 
Regression Accuracy and Model Persistence

Linear regressors tend to be inaccurate sometimes, as outliers disrupt the model. We need to regularize this. We will see that in this video.

  • Create a ridge regressor
  • Initialize the alpha parameter
  • Train the regressor 
Building a Ridge Regressor

Linear model fails to capture the natural curve of datapoints, which makes it quite inaccurate. So, let’s go through polynomial regressor to see how we can improve that.

  • Observe the polynomial regression model
  • Initialize a polynomial of a certain degree
  • Measure the accuracy of the model 
Building a Polynomial Regressor

Applying regression concepts to solve real-world problems can be quite tricky. We will explore how to do it successfully.

  • Get a standard housing dataset and divide it into input and output
  • Fit a decision tree regression model
  • Evaluate the performance of AdaBoost 
Estimating housing prices

We don’t really have an idea on which feature contributes to the output and which doesn’t. It becomes critical to know that, in case we’ve to omit one. This video will help you compute their relative importance.

  • Plotting the relative importance of features
  • Scale values from the feature_importances_ method
  • Compare the output of the decision tree regressor with that of AdaBoost 
Computing relative importance of features

There might be some problems where the basic regression methods we’ve learned won’t help. One such problem is bicycle demand distribution. You will see how to solve that here.

  • Import csv, RandomForestRegressor and plot_feature_importances
  • Train the regressor and evaluate its performance
  • Plot the importances feature with varying dataset. 
Estimating bicycle demand distribution

Evaluating the accuracy of a classifier is an important step in the world of machine learning. We need to learn how to use the available data to get an idea as to how this model will perform in the real world. This is what we are going to learn in this section.

  • Create sample points and assign some labels to them
  • Plot these points by mapping on X and Y axes
  • Use a straight line as a classifier 
Building a Simple Classifier

Despite the word regression being present in the name, logistic regression is actually used for classification purposes. Given a set of datapoints, our goal is to build a model that can draw linear boundaries between our classes. It extracts these boundaries by solving a set of equations derived from the training data.

  • Initialize the logistic regression classifier and train with the function
  • Define a function to draw datapoints and boundaries
  • Plot the boundaries and overlay the training points 
Building a Logistic Regression Classifier

Bayes’ Theorem, which has been widely used in probability to determine the outcome of an event, enables us to classify the data in a smarter way. Let us use its concept to make our classifier more amazing.

  • Load the data from the data multivar.txt file
  • Build the Naive Bayes classifier and compute its accuracy
  • Plot the data and the boundaries using the plot classifier method 
Building a Naive Bayes’ Classifier

While working with data, splitting data correctly and logically is an important task. Let’s see how we can achieve this in Python.

  • Import the cross_validation package
  • Evaluate the classifier on the test data and compute its accuracy
  • Plot the datapoints and the boundaries on the test data  
Splitting the Dataset for Training and Testing

In order to make splitting of dataset more robust, we repeat the process of splitting with different subsets. If we just fine-tune it for a particular subset, we may end up over fitting the model, which may fail to perform well on unknown data. Cross validation ensures accuracy in such a situation.

  • Set the number of validations and the accuracy expression
  • Calculate the precision, recall and F1 score 
Evaluating the Accuracy Using Cross-Validation

When we want to fine-tune our algorithms, we need to understand how the data gets misclassified before we make these changes. Some classes are worse than others, and the confusion matrix will help us understand this.

  • Define the plot_confusion_matrix() function
  • Use classification_report() to print the report 
Visualizing the Confusion Matrix and Extracting the Performance Report

Let's see how we can apply classification techniques to a real-world problem. We will use a dataset that contains some details about cars, such as number of doors, boot space, maintenance costs, and so on, to analyze this problem.

  • Load the dataset and convert strings to numbers
  • Train the classifier and perform cross validation
  • Use a single datapoint and use the classifier to categorize it. 
Evaluating Cars based on Their Characteristics

Let’s see how the performance gets affected as we change the hyperparameters. This is where validation curves come into the picture. These curves help us understand how each hyperparameter influences the training score.

  • Import the validation_curve package
  • Plot the graph for max_depth and n_estimators 
Extracting Validation Curves

Learning curves help us understand how the size of our training dataset influences the machine learning model. This is very useful when you have to deal with computational constraints. Let's go ahead and plot the learning curves by varying the size of our training dataset.

  • Import the learning_curve package
  • Use five-fold crossvalidation in the learning_curve method
  • Plot the graph 
Extracting Learning Curves

Let’s see how we can build a classifier to estimate the income bracket of a person based on 14 attributes.

  • Use Naive Bayes classifier and load the dataset
  • Convert string attributes to numerical data
  • Split the data into training and testing, and extract performance metrics 
Extracting the Income Bracket

Building regressors and classifiers can be a bit tedious. Supervised learning models like SVM help us to a great extent. Let’s see how we can work with SVM.

  • Visualize the data
  • Plot the data and separate training and testing dataset
  • Initialize SVM object and train SVM classifier 
Building a Linear Classifier Using Support Vector Machine

There are various kernels used to build nonlinear classifiers. Let’s explore some of them and see how we can build a nonlinear classifier.

  • Using a Polynomial function
  • Using a Radial basis function 
Building Nonlinear Classifier Using SVMs

A classifier often gets biased when there are more datapoints in a certain class. This can turn out to be a big problem. We need a mechanism to deal with this. Let’s explore how we can do that.

  • Visualize the data
  • Build SVM with linear kernel
  • Use the class_weight parameter 
Tackling Class Imbalance

Let’s explore how we can train SVM to compute the output confidence level of a new datapoint when it is classified into a known category.

  • Measure the boundary distance
  • Train the classifier
  • Use predict_probafunction to measure the confidence value
Extracting Confidence Measurements

It’s critical to evaluate the performance of a classifier. We need certain hyper parameters to do so. Let’s explore how to find those parameters.

  • Define the metric
  • Start the search for hyper parameters
  • Print the best sets for the training and test datasets 
Finding Optimal Hyper-Parameters

Now that we’ve learned the concepts of SVM thoroughly, let’s see if we can apply them to real-world problems.

  • Understand data format and load the data
  • Convert the data into a numerical form
  • Train the SVM and test on a new datapoint 
Building an Event Predictor

We’ve already used SVM as a classifier to predict events. Let’s explore whether or not we can use it as a regressor for estimating traffic.

  • Load input data and encode it
  • Perform cross validation to check regressor performance
  • Test the regressor on a datapoint 
Estimating Traffic

The k-means algorithm is one of the most popular clustering algorithms, which is used to divide the input data into k subgroups using various attributes of the data. Let’s see how we can implement it in Python for Clustering data.

  • Load input data and define the number of clusters
  • Initialize the k-means object and train it
  • Visualize the boundaries and overlay the centroids 
Clustering Data Using the k-means Algorithm

Vector quantization is popularly used in image compression, where we store each pixel using fewer bits than the original image to achieve compression.

  • Create a function to parse the input arguments
  • Create a function to compress the input image
  • Define the main function to take the input arguments and extract the output image 
Compressing an Image Using Vector Quantization

The Mean Shift is a powerful unsupervised learning algorithm that's used to cluster datapoints. It considers the distribution of datapoints as a probabilitydensity function and tries to find the modes in the feature space. Let’s see how to use it in Python.

  • Load the input data from the data_multivar.txt file
  • Build a Mean Shift clustering model
  • Train the model, extract the labels, and iterate through datapoints 
Building a Mean Shift Clustering

Many a times, we need to segregate data and group them for the purpose of analysis and much more. We can achieve this in Python using theagglomerative clustering. Let’s see how we can do it.

  • Extract the labels and specify the shapes of the markers
  • Define functions to get datapoints located on spiral and rose-curve
  • Define a hypotrochoid and the main function 
Grouping Data Using Agglomerative Clustering

In supervised learning, we just compare the predicted values with the original labels to compute their accuracy. In unsupervised learning, we don't have any labels. Therefore, we need a way to measure the performance of our algorithms. Let’s see how we could evaluate their performance.

  • Load the input data from the data_perf.txt file
  • Iterate through a range of values of input data to find the peak 
Evaluating the Performance of Clustering Algorithms

Wouldn't it be nice if there were a method that can just tell us the number of clusters in our data? This is where Density-Based Spatial Clustering of Applications with Noise (DBSCAN) comes into the picture. Let us see how we can work with it.

  • Load the input data and sweep the parameter space
  • Extract the number of clusters and all core samples
  • Extract the set of unique labels and specify different markers 
Automatically Estimating the Number of Clusters Using DBSCAN

How will we operate with the assumption that we don't know how many clusters there are. As we don't know the number of clusters, we can use an algorithm called Affinity Propagation to cluster. Let's see how we can use unsupervised learning for stock market analysis with this.

  • Load the symbol_map.jsonfile
  • Read data from the symbol map file and specify the time period
  • Standardize the data and train the model 
Finding Patterns in Stock Market Data

What could we do when wedon't have labeled data available all the time but it's important to segment the market so that people can target individual groups? Let’s learn to build a customer segmentation model for this situation.

  • Load the input data from the wholesale.csv file
  • Build a Mean Shift model and print the centroids 
Building a Customer Segmentation Model

One of the major parts of any machine learning system is the data processing pipeline. Instead of calling functions in a nested way, it's better to use the functional programming paradigm to build the combination. Let's take a look at how to combine functions to form a reusable function composition.

  • Create three basic functions to take input arguments
  • Define a function composer to take functions and return composed function
  • Use a regular method and a function composer, both to execute the same output 
Building Function Composition for Data Processing

The scikit-learn library has provisions to build machine learning pipelines. We just need to specify the functions, and it will build a composed object that makes the data go through the whole pipeline. Let’s see how to build it in Python.

  • Create some sample data and select the k-best feature
  • Use a random forest classifier and pipeline method
  • Train the classifier and estimate the performance 
Building Machine Learning Pipelines

While working with the training dataset, we need to make a decision based on the number of nearest neighbors in it. This can be achieved with the help of the NearestNeighbor method in Python. Let’s see how to do it.

  • Define a random datapoint in sample data
  • Define the NearestNeighbors object 
Finding the Nearest Neighbors

When we want to find the class to which an unknown point belongs, we find the k-nearest neighbors and take a majority vote. Let's take a look at how to construct this.

  • Use the data_nn_classifier.txt file for input data
  • Specify the number of nearest neighbors and build a k-nearest neighbors classifier
  • Extract the k-nearest neighbors and plot it.  
Constructing a k-nearest Neighbors Classifier

A good thing about the k-nearest neighbors algorithm is that it can also be used as a regessor. Let’s see how to do this!

  • Generate sample Gaussian-distributed data and add some noise to it
  • Define a denser grid of points and the number of nearest neighbors
  • Initialize and train the k-nearest neighbors regressor 
Constructing a k-nearest Neighbors Regressor

In order to find users in the database who are similar to a given user we need to define a similarity metric. Euclidean distance score is one such metric that we can use to compute the distance between data points. Let’s look at this in more detail in this video.

  • Define a function to compute the Euclidean score between two users
  • Use the movie ratings.json file as the data file
  • Take two random users and compute the Euclidean distance score 
Computing the Euclidean Distance Score

The Euclidean distance score is a good metric, but it has some shortcomings. Hence, Pearson correlation score is frequently used in recommendation engines. Let's see how to compute it.

  • Define a function to compute the Pearson correlation score
  • Get the movies that both these users rated
  • Define the main function and compute the Pearson correlation score 
Computing the Pearson Correlation Score

One of the most important tasks in building a recommendation engine is finding users that are similar. Let's see how to do this in this video.

  • Define a function to find similar users which takes three arguments
  • Sort the scores in descending order
  • Extract the k top scores and return them 
Finding Similar Users in a Dataset

Now that we’ve built all the different parts of a recommendation engine, we are ready to generate movie recommendations. Let’s see how to do that in this video.

  • Define a function for movie recommendation taking dataset and username
  • Compute the Pearson score of the user
  • Create a normalized list of movie ranks and extract recommendations 
Generating Movie Recommendations

With tokenization we can define our own conditions to divide the input text into meaningful tokens. This gives us the solution for dividing a chunk of text into words or into sentences. Let's take a look at how to do this.

  • Import the sent_tokenize and word_tokenize packages
  • Use the WordPunctTokenizer to split punctuations 
Preprocessing Data Using Tokenization

During text analysis, it's useful to extract the base form of the words to extract some statistics to analyze the overall text. This can be achieved with stemming, which uses a heuristic process to cut off the ends of words. Let's see how to do this in Python.

  • Import the PorterStemmer, LancasterStemmer, and SnowballStemmerpackagaes
  • Define a few words and a list of stemmers
  • Iterate and stem the words using the three stemmers 
Stemming Text Data

Sometimes the base words that we obtained using stemmers don't really make sense. Lemmatization solves this problem by doing things using a vocabulary and morphological analysis of words and removes inflectional word endings. Let's take a look at how to do this in this video.

  • Import the WordNetLemmatizer package
  • Compare two lemmatizers and list them 
Converting Text to Its Base Form Using Lemmatization

When you deal with a really large text document, you need to divide it into chunks for further analysis. In this video, we will divide the input text into a number of pieces, where each piece has a fixed number of words.

  • Define a function to split text into chunks
  • Initialize variables and iterate through words
  • Load the data from the Brown corpus and call the slitter function 
Dividing Text Using Chunking

When we deal with text documents that contain millions of words, we need to convert them into some kind of numeric representation so as to make them usable for machine learning algorithms. A bag-of- words model is what helps us achieve this task quite easily.

  • Divide the text data into chunks
  • Extract a document term matrix
  • Extract vocabulary from the vectorizer object 
Building a Bag-of-Words Model

The goal of text classification is to categorize text documents into different classes. This is an extremely important analysis technique in NLP. Let us see how we can build a text classifier for this purpose.

  • Import the feature extractor
  • Use the Multinomial Naive Bayes classifier and train it
  • Define and train the tf-idf transformer object 
Building a Text Classifier

Identifying the gender of a name is an interesting task in NLP. Also gender recognition is a part of many artificial intelligence technologies. Let us see how to identify gender in Python.

  • Seed the random number generator, and shuffle the training data
  • Divide the data into train and test datasets
  • Use the Naïve Bayes classifier 
Identifying the Gender

How could we discover the feelings or sentiments of different people about a particular topic? This video helps us to analyze that.

  • Use movie reviews in NLTK
  • Divide the data into training and testing datasets
  • Use a Naive Bayes classifier, define the object, and train it  
Analyzing the Sentiment of a Sentence

With topic modeling, we can uncover some hidden thematic structure in a collection of documents. This will help us in organizing our documents in a better way so that we can use them for analysis. Let’s see how we can do it!

  • Use the data_topic_modeling. txt text file
  • Define a processor function for tokenization
  • Initialize the Latent Dirichlet Allocation (LDA) model 
Identifying Patterns in Text Using Topic Modelling

Reading an audio file and visualizing the signal is a good starting point that gives us a good understanding of the basic structure of audio signals. So let us see in this video how we could do it!

  • Use the wavfilepackage to read the audio file
  • Normalize the 16-bit signed integer data
  • Extract the first 30 values to plot and convert to seconds 
Reading and Plotting Audio Data

Audio signals consist of a complex mixture of sine waves of different frequencies, amplitudes and phases. There is a lot of information that is hidden in the frequency content of an audio signal. So it’s necessary to transform the audio signal into a frequency domain. Let’s see how to do this.

  • Read the input_freq.wav file and normalize the signal
  • Extract the length of the signal and double it
  • Extract the power signal and plot the frequency graph 
Transforming Audio Signals into the Frequency Domain

We can use NumPy to generate audio signals. As we know, audio signals are complex mixtures of sinusoids. Let’s see how we can generate audio signals with custom parameters.

  • Specify the audio generation parameters
  • Add some noise and scale the values to 16-bit integers
  • Generate the time axis and convert it into seconds 
Generating Audio Signals with Custom Parameters

Music has been explored since centuries and technology has set new horizons to play with it. We can also create music notes in Python. Let’s see how we can do this.

  • Define the main function and use the tone_freq_map.jsonfile in it
  • Iterate through the list of notes and call synthesizer function
  • Write the signal to the output file with the write() 
Synthesizing Music

When we deal with signals and we want to use them as input data and perform analysis, we need to convert them into frequency domain. So, let’s get hands-on with it!

  • Extract the MFCC and filter bank features
  • Visualize the MFCC features and filter bank features 
Extracting Frequency Domain Features

A hidden Markov Model represents probability distributions over sequences of observations. It allows you to find the hidden states so that you can model the signal. Let us explore how we can use it to perform speech recognition.

  • Use Gaussian HMMs to model the data
  • Define a method to extract the score 
Building Hidden Markov Models

This video will walk you through building a speech recognizer by using the audio files in a database. We will use seven different words, where each word has 15 audio files. Let’s go ahead and do it!

  • Initiate the variable to hold all the HMM models
  • Extract MFCC features and define variables to store the maximum score
  • Extract the score and store the maximum score 
Building a Speech Recognizer

Let’s understand how to convert a sequence of observations into time series data and visualize it. We will use a library called pandas to analyze time series data. At the end of this video, you will be able to transform data into the time series format.

  • Define a function to convert sequential observations into time-indexed data
  • Extract the starting and ending dates of the dataset and create a pandas variable
  • Return the time-indexed variable and use the data_timeseries.txt file 
Transforming Data into the Time Series Format
Slicing Time Series Data

You can filter the data in many different ways. The pandas library allows you to operate on time series data in any way that you want. Let's see how to operate on time series data.

  • Use the third and fourth columns in the text file
  • Convert the data into a pandas data frame
  • Plot the difference between the two columns 
Operating on Time Series Data

One of the main reasons that we want to analyze time series data is to extract interesting statistics from it. This provides a lot of information regarding the nature of the data. Let’s see how to extract these stats.

  • Load the third and fourth column of data series.
  • Create a pandas data structure and extract maximum and minimum values
  • Print the rolling mean and the correlation coefficients 
Extracting Statistics from Time Series

Hidden Markov Models are really powerful when it comes to sequential data analysis. They are used extensively in finance, speech analysis, weather forecasting, sequencing of words, and so on. We are often interested in uncovering hidden patterns that appear over time. Let’s see how we can use it.

  • Load the data_hmm.txt file into a NumPy array
  • Stack the data column-wise and train the HMM
  • Run the predictor and compute the mean and variance 
Building Hidden Markov Models for Sequential Data

The Conditional Random Fields (CRFs) are probabilistic models used to analyze structured data and also to label and segment sequential data. Let us see how we can use it to work on our input dataset!

  • Define a class to handle CRF-related processing
  • Train the CRF model and evaluate its performance
  • Take a random test vector and predict the output 
Building Conditional Random Fields for Sequential Text Data

This video will get you hands-on with analyzing stock market data and understanding the fluctuations in the stocks of different companies. So let’s see how to do this!

  • Compute the percentage change in closing value of each data type
  • Train the HMM using five components
  • Generate 500 samples using the trained HMM and plot this 
Analyzing Stock Market Data with Hidden Markov Models

OpenCV is the world's most popular library for computer vision. It enables us to analyze images and do a lot of stuff with it. Let’s see how to operate it!

  • Extract the height and width of the input image
  • Crop the image using NumPy style slicing
  • Resize the image to 1.3 times by setting scaling factor 
Operating on Images Using OpenCV-Python

When working with images, it is essential to detect the edges to process the image and perform different operations with it. Let’s see how to detect edges of the input image in Python.

  • Load the input image
  • Extract the height and width of the image
  • Use the Sobel filter, Laplacian and Canny edge detector 
Detecting Edges

The human eye likes contrast! This is the reason that almost all camera systems use histogram equalization to make images look nice. This video will walk you through the use of histogram equalization in Pyhton.

  • Convert the image to grayscale and display it
  • Equalize the histogram of the grayscale image and display it
  • Convert the image from BGR to YUV and equalize Y-channel 
Histogram Equalization

One of the essential steps in image analysis is to identify and extract the salient features for the purpose of computer vision. This can be achieved with a corner detection technique and SIFT feature point in Python. This video will enable you to achieve this goal!

  • Convert the image to grayscale and cast floating-point values
  • Dilate and threshold the image to display the important points
  • Initialize the SIFT detector object and extract the keypoints 
Detecting Corners and SIFT Feature Points

When we build object recognition systems, we may want to use a different feature detector before we extract features using SIFT; that will give us the flexibility to cascade different blocks to get the best possible performance. Let’s see how to do it with Star feature detector.

  • Define a class to handle Star feature detection functions
  • Define a function to run the detector on the input image
  • Convert image to grayscale, and detect features using Star feature detector  
Building a Star Feature Detector

 Have you ever wondered how you could build image signatures? If yes, this video will take you through creating features by using visual codebook, which will enable you to achieve this goal. So, let’s dive in and watch it!

  • Use Star detector to get keypoints and SIFT to extract descriptors
  • Set the number of dimensions to 128 and extract centroids
  • Build a histogram and normalize it  
Creating Features Using Visual Codebook and Vector Quantization

We can construct a bunch of decision trees that are based on our image signatures, and then train the forest to make the right decision. Extremely Random Forests (ERFs) are used extensively for this purpose. Let’s dive in and see how to do it!

  • Define the argument parser and a class to handle ERF training
  • Encode the labels and train the classifier
  • Load the feature map and extract the feature vectors 
Training an Image Classifier Using Extremely Random Forests

While dealing with images, we tend to tackle problems with the contents of unknown images. This video will enable you to build an object recognizer which allows you to recognize the content of unknown images. So, let’s see it!

  • Define the argument parser
  • Define a class to handle the image tag extraction functions
  • Define a function to predict output and scale the image  
Building an object recognizer

Webcams are widely used for real-time communications and for biometric data analysis. This video will walk you through capturing and processing video from your webcam.

  • Initialize the video capture object
  • Define the scaling factor for the frames captured
  • Start an infinite loop and set frame delay to 1 millisecond 
Capturing and Processing Video from a Webcam

Haar cascade extracts a large number of simple features from the image at multiple scales. The simple features are basically edge, line, and rectangle features that are very easy to compute. It is then trained by creating a cascade of simple classifiers. Let’s see how we can detect a face with it!

  • Load the face detector cascade file
  • Define the scaling factor and resize the frame
  • Convert the image to grayscale and run the face detector on it 
Building a Face Detector using Haar Cascades

The Haar cascades method can be extended to detect all types of objects. Let's see how to use it to detect the eyes and nose in the input video.

  • Load the face, eyes, and nose cascade files
  • Initialize the video capture object and define the scaling factor
  • Extract the face ROI. Run the eye detector and nose detector. 
Building Eye and Nose Detectors

Principal Components Analysis (PCA) is a dimensionality reduction technique that's used very frequently in computer vision and machine learning. It’s used to reduce the dimensionality of the data before we can train a system. This video will take you through the use of PCA.

  • Define five dimensions for the input data
  • Create a PCA object and fit a PCA model on the input data
  • Convert the five-dimensional set to a two-dimensional set. 
Performing Principal Component Analysis

What if you need to reduce the number of dimensions in unorganized data? PCA, which we used in the last video, is inefficient in such situations. Let us see how we can tackle this situation.

  • Generate data that is distributed in concentric circles
  • Perform PCA and Kernel PCA on this data
  • Plot the PCA-transformed data and Kernel PCA-transformed data. 
Performing Kernel Principal Component Analysis

When we work with data or signals, they are generally received in a raw form. Or rather we can say they are a mixture of some unwanted stuff. It is essential for us to segregate them, so as to work on these signals. This video will enable you to achieve this goal.

  • Create an ICA object and reconstruct signals based on it
  • Extract the mixing matrix and perform PCA for comparison
  • Plot the ICA-separated and PCA-separated signals. 
Performing Blind Source Separation

We are now finally ready to build a face recognizer! Let’s see how to do it!

  • Define a method to extract images and labels from the input folder
  • Create Local Binary Patterns Histogram face recognizer objects
  • Test the face recognizer on unknown data. 
Building a Face Recognizer Using a Local Binary Patterns Histogram

Let’s start our neural network adventure with a perceptron, which is a single neuron that performs all the computations.

  • Define a perceptron with two inputs
  • Train the perceptron and set the value as 0.01 for the show parameter 
Building a Perceptron

Now that we know how to create a perceptron, let's create a single-layer neural network which will consist of multiple neurons in a single layer.

  • Plot the input data and extract the minimum and maximum values
  • Train the neural network until 50 epochs
  • Test the neural network on new test data 
Building a Single-Layer Neural Network

Let’s build a deep neural network, which will have multiple layers. There will be some hidden layers between the input and output layers. So, let us explore it.

  • Reshape the arrays and plot the input data.
  • Set the training algorithm to gradient descent
  • Run the network on training data and plot the training error 
Building a deep neural network

Let us see how we can use vector quantization in machine learning and computer vision.

  • Define a learning vector quantization neural network with two layers
  • Train the LVQ neural network and evaluate it
  • Define four classes in data and grids for these classes 
Creating a Vector Quantizer

This video will walk you through analyzing sequential and time series data and enable you to extend generic models for them.

  • Define a function to create a waveform
  • Create a recurrent neural network with two layers
  • Create waveform of random length and test network for prediction 
Building a Recurrent Neural Network for Sequential Data Analysis

Let us look at how to use neural networks to perform optical character recognition to identify handwritten characters in images.

  • Define the visualization parameters and keep looping through the file
  • Reshape the array into the required shape and resize it
  • Let us look at how to use neural networks to perform optical character recognition to identify handwritten characters in images 
Visualizing the Characters in an Optical Character Recognition Database

Let's build a neural-network-based optical character recognition system.

  • Take 20 data points and define the distinct characters
  • Use 90% of data for training and remaining for testing
  • Train the neural network until 10,000 epochs & predict output 
Building an Optical Character Recognizer Using Neural Networks

3D visualization is very important in data representation. So we need to learn the simple yet effective method of plotting 3D plots.

  • Create an empty figure.
  • Generate values.
  • Plot values. 
Plotting 3D Scatter plots

You are going to learn to plot bubble plots in this video.

  • Create new Python file and generate random values.
  • Define the area plot for each bubble point and assign colors.
  • Plot and run on cmd. 
Plotting Bubble Plots

When data is moving or transient, we need animation to visualize it properly.

  • Define the tracker function. Assign color and update values and sizes.
  • Create an empty figure. Define a scatter plot.
  • Start the animation and run on cmd. 
Animating Bubble Plots

When there are distribution tables and various labels, pie charts are handy to express data.

  • Define labels, values and colors.
  • Plot the pie chart.
  • Assign variables to highlight the label and plot the new chart again. 
Drawing Pie Charts

To keep a track of data with respect to time, we need to plot date-formatted time series data.

  • Define the Function. Extract information at a given time.
  • Define the main function and X and Y axes.
  • Add the plot and run the code. 
Plotting Date-Formatted Time Series Data

When we need to compare data of two different entities, we need to plot histograms. This video is going to help you do that.

  • Create a Python file and define values for the products.
  • Create a figure and define parameters.
  • Plot the histogram using functions. 
Plotting Histograms

Heat maps are useful when data in two groups is associated point by point.

  • Define the groups. Generate a 2D matrix.
  • Create a figure and create the heat map.
  • Plot the figure and run the command on cmd. 
Visualizing Heat Maps

When we visualize real-time signals, it becomes imperative to animate the dynamic signals so that they are updated continuously. This video will help you do that.

  • Create the function for the signal.
  • Initialize and set parameters
  • Define the main function. Define the animator object and run the code. 
Animating Dynamic Signals
+ Troubleshooting Python Machine Learning
21 lectures 03:17:14

This video gives an overview of the entire course.

Preview 02:14

In this video, we will learn Splitting the datasets into three to make sure models generalize.

  • Learn why your trained model does not perform well on real data
  • Learn how to simulate your model seeing real data
  • Learn to split your datasets into three
Splitting Your Datasets for Train, Test, and Validate

In this video, we will learn how Save your models so it can be used again later.

  • Learn why it is important to save models to disk
  • Learn how to use the pickle library to save models
  • Learn how to load the saved model
Persist Your Hard Earned Models by Saving Them to Disk

In this video, we will kickstart any NLP project with an efficient way to count words in a file.

  • Count words in a file
  • Explore the python collection Counter
  • Learn how to use Counter to count words in a file
Calculate Word Frequencies Efficiently in Good ol' Python

In this video, we will use categorical data and words as features in scikit-learn models.

  • Transform your variable length features into one-hot vectors
  • Learn about the scikit-learn function MultiLabelBinarizer
  • Use MultiLabelBinarizer to transform your features
Transform Your Variable Length Features into One-Hot Vectors

This video will help you understand your trained classifiers better by isolating important features.

  • Learn the feature importance in linear classifiers
  • Feature weights as a proxy to the classification decision
  • Use standard deviation to normalize model coefficients
Finding the Most Important Features in Your Classifier

In this video, we will see how you reuse your data to predict multiple targets.

  • Learn what is multi-output regression
  • Explore what data do you need for multi-output regression
  • Use the multi-output wrapper in scikit-learn
Predicting Multiple Targets with the Same Dataset

In this video, we will see how to get the best parameters after a grid search.

  • Create dummy data for classification problems
  • Setup a GridSearch for a Random Forest classifier
  • Retrieve the best hyperparameters after a GridSearch optimization
Retrieving the Best Estimators after Grid Search

In this video, we will get a more detailed report on your linear regression and pandas.

  • Explore the difference between scikit-learn and statsmodels
  • Explore what does scikit-learn's linear regression gives you
  • Interpret the output of the statsmodels ols function
Regress on Your Pandas Data Frame with Simple Statsmodels OLS

This video will help you understand what rules your decision trees are using.

  • Fit a simple decision tree to the iris dataset
  • Use an inner property of our decision tree model to extract rules
  • Construct a recursive visitor to the decision tree
Extracting Decision Tree Rules from scikit-learn

In this video, we will be explaining important features in your random forests to make them more interpretable.

  • Fit a random forest model to iris data
  • Print out feature importance in our random forests
  • Plot out feature importance in our random forests
Finding Out Which Features Are Important in a Random Forest Model

In this video, we will be classifying with SVMs when your data has unbalanced classes.

  • Explore the imbalanced class problem
  • Use upsampling to solve the imbalanced class problem
  • Use class weights to solve the imbalanced class problem
Classifying with SVMs When Your Data Has Unbalanced Classes

In this video, we will learn computing true/false positives/negatives after in scikit-learn.

  • Create a simple confusion matrix after classification
  • Visualize the confusion matrix with matplotlib
  • Compute precision, recall, and specificity
Computing True/False Positives/Negatives after in scikit-learn

This video will help you in understanding what the principal components mean after performing PCA.

  • Learn how to apply PCA on the iris dataset
  • Understand the output of PCA and what it does
  • Extract the factors that contribute to each principal component
Labelling Dimensions with Original Feature Names after PCA

In this video, we will be Learning how to cluster text documents for classification using k-means.

  • Download the 20 newsgroups dataset that is used widely for NLP tasks
  • Use TF-IDF on the 20 newsgroups dataset
  • Cluster the documents using k-means
Clustering Text Documents with scikit-learn K-means

In this video, we will be listing word frequencies.

  • Get all the unique words of a corpus
  • Count the number of occurrences
  • Count the occurrence of each word per document
Listing Word Frequency in a Corpus Using Only scikit-learn

In this video, we will be performing polynomial regression on dummy data with higher order effects.

  • Preprocess your feature space using a polynomial kernel
  • Combine linear regression and polynomial features
  • Perform polynomial regression on dummy data with higher order effects
Polynomial Kernel Regression Using Pipelines

In this video, we will learn how to visualize a function over two dimensions.

  • Learn how to manually create a plottable surface over two dimensions
  • Use meshgrid to do the same thing
  • Use meshgrid to plot a two-dimensional function
Visualize Outputs Over Two-Dimensions Using NumPy's Meshgrid

In this video, we will learn how to visualize a decision tree's learned rules.

  • Install GraphViz and pydotplus
  • Export a visualization of a trained decision tree classifier
  • Interpret the visualization
Drawing Out a Decision Tree Trained in scikit-learn

In this video, we will create more informative histogram visualizations.

  • Create a histogram for any dataset
  • Label each bin with how many items are there
  • Label the percentage of data within each bin
Clarify Your Histogram by Labelling Each Bin

In this video, we will share one colorbar between multiple subplots in matplotlib.

  • Adjust the dimensions of existing subplots
  • Create a new axis for the colorbar
  • Add a colorbar all on its own
Centralizing Your Color Legend When You Have Multiple Subplots
  • Prior familiarity with Python programming is assumed.
  • Basic understanding of Machine Learning concepts would certainly be useful.

You are a data scientist. Every day, you stare at reams of data trying to apply the latest and brightest of models to uncover new insights, but there seems to be an endless supply of obstacles. Your colleagues depend on you to monetize your firm's data - and the clock is ticking. What do you do?

Troubleshooting Python Machine Learning is the answer.

Machine learning gives you powerful insights into data. Today, implementations of machine learning are adopted throughout Industry and its concepts are many. Machine learning is pervasive in the modern data-driven world. Used across many fields such as search engines, robotics, self-driving cars, and more.

The effective blend of Machine Learning with Python, helps in implementing solutions to real-world problems as well as automating analytical model.

This comprehensive 3-in-1 course is a comprehensive, practical tutorial that helps you get superb insights from your data in different scenarios and deploy machine learning models with ease. Explore the power of Python and create your own machine learning models with this project-based tutorial. Try and test solutions to solve common problems, while implementing Machine learning with Python.

Contents and Overview

This training program includes 3 complete courses, carefully chosen to give you the most comprehensive training possible.

The first course, Python Machine Learning Projects, covers Machine Learning with Python's insightful projects. This video is a unique blend of projects that teach you what Machine Learning is all about and how you can implement machine learning concepts in practice. Six different independent projects will help you master machine learning in Python. The video will cover concepts such as classification, regression, clustering, and more, all the while working with different kinds of databases. You’ll be able to implement your own machine learning models after taking this course.

The second course, Python Machine Learning Solutions, covers 100 videos that teach you how to perform various machine learning tasks in the real world. Explore a range of real-life scenarios where machine learning can be used, and look at various building blocks. Throughout the course, you’ll use a wide variety of machine learning algorithms to solve real-world problems and use Python to implement these algorithms. Discover how to deal with various types of data and explore the differences between machine learning paradigms such as supervised and unsupervised learning

The third course, Troubleshooting Python Machine Learning, covers practical and unique solutions to common Machine Learning problems that you face. Debug your models and research pipelines, so you can focus on pitching new ideas and not fixing old bugs. By the end of the course, you’ll get up-and-running via Machine Learning with Python’s insightful projects to perform various Machine Learning tasks in the real world.

About the Authors

  • Alexander T. Combs is an experienced data scientist, strategist, and developer with a background in financial data extraction, natural language processing and generation, and quantitative and statistical modeling. He is currently a full-time lead instructor for a data science immersive program in New York City.
  • Prateek Joshi is an Artificial Intelligence researcher, the published author of five books, and a TEDx speaker. He is the founder of Pluto AI, a venture-funded Silicon Valley startup building an analytics platform for smart water management powered by deep learning. His work in this field has led to patents, tech demos, and research papers at major IEEE conferences. He has been an invited speaker at technology and entrepreneurship conferences including TEDx, AT&T Foundry, Silicon Valley Deep Learning, and Open Silicon Valley. Prateek has also been featured as a guest author in prominent tech magazines. His tech blog has received more than 1.2 million page views from over 200 countries and has over 6,600+ followers. He frequently writes on topics such as Artificial Intelligence, Python programming, and abstract mathematics. He is an avid coder and has won many hackathons utilizing a wide variety of technologies. He graduated from University of Southern California with a Master's degree, specializing in Artificial Intelligence. He has worked at companies such as Nvidia and Microsoft Research.
  • Colibriis a technology consultancy company founded in 2015 by James Cross and Ingrid Funie. The company works to help its clients navigate the rapidly changing and complex world of emerging technologies, with deep expertise in areas like big data, data science, Machine Learning, and Cloud Computing. Over the past few years, they have worked with some of the world's largest and most prestigious companies, including a tier 1 investment bank, a leading management consultancy group, and one of the world's most popular soft drinks companies, helping all of them to better make sense of their data, and process it in more intelligent ways. The company lives by its motto: Data -> Intelligence -> Action.
  • Rudy Lai is the founder of Quant Copy, a sales acceleration startup using AI to write sales emails to prospects. By taking in leads from your pipelines, Quant Copy researches them online and generates sales emails from that data. It also has a suite of email automation tools to schedule, send, and track email performance—key analytics that all feed back into how our AI generated content. Prior to founding Quant Copy, Rudy ran HighDimension.IO, a machine learning consultancy, where he experienced firsthand the frustrations of outbound sales and prospecting. As a founding partner, he helped startups and enterprises with HighDimension.IO's Machine-Learning-as-a-Service, allowing them to scale up data expertise in the blink of an eye. In the first part of his career, Rudy spent 5+ years in quantitative trading at leading investment banks such as Morgan Stanley. This valuable experience allowed him to witness the power of data, but also the pitfalls of automation using data science and machine learning. Quantitative trading was also a great platform from which to learn a lot about reinforcement learning and supervised learning topics in a commercial setting. Rudy holds a Computer Science degree from Imperial College London, where he was part of the Dean's List, and received awards such as the Deutsche Bank Artificial Intelligence prize.
Who this course is for:
  • Developers and data scientist, who have a basic machine learning knowledge and want to explore the various arenas of machine learning by creating insightful and interesting projects.
  • Python programmers who are looking to use machine-learning algorithms to create real-world applications.