
Let’s begin the course with the content coverage.
Before you start this course, we'll install Python 3.6, pip, scikit-learn, and the other libraries used throughout this course.
Let us begin with the first lesson and understand what we are going to cover in our learning journey.
We build models so that we can learn something about the data we are training on and about the relationships between the features of the dataset. This learning can inform us when we encounter new observations. However, we must realize that the observations we interact with in the real world and the format of data needed to train machine learning models are very different. Working with text data is a prime example of this. When we read text, we can understand each word and apply context given to each word in relation to the surrounding words -- not a trivial task. However, machines are unable to interpret this contextual information. Unless it specifically encoded, they have no idea how to convert text into something that can be an input numerical. Therefore, we must represent the data appropriately, often by converting non-numerical data types, for example, converting text, dates, and categorical variables into numerical ones. Let us learn more about it with the following topics:
Tables of Data
Loading Data
In this video, we will be loading the bank marketing dataset from the UCI Machine Learning Repository. The goal of this video will be to load in the CSV data, identify a target variable to predict, and feature variables with which to use to model the target variable. Finally, we will separate the feature and target columns and save them to CSV files.
To fit models to the data, it must be represented in numerical format since the mathematics used to in all machine learning algorithms only work on matrices of numbers. This will be one goal of this video, to learn how to encode all features into numerical representations.
It is important that we clean the data appropriately so that it can be used for training models. This often includes converting non-numerical datatypes into numerical datatypes. This will be the focus of this video – to convert all columns in the feature dataset into numerical columns.
In our bank marketing dataset, we have some columns that do not appropriately represent the data, which will have to be addressed if we want the models, we build to learn useful relationships between the features and the target. One column that is an example of this is the pdays column.
In this video, we will cover the lifecycle of creating performant machine learning models from engineering features, to fitting models to training data, and evaluating our models using various metrics. Many of the steps to create models are highly transferable between all machine learning libraries – we'll start with scikit-learn, which has the advantage of being widely used, and as such there is a lot of documentation, tutorials, and learning to be found across the internet.
We will start by utilizing scikit-learn. This will help us establish the fundamentals of building a machine learning model using the Python programming language. Like scikit-learn, Keras makes it easy to create models in the Python programming language through an easy-to-use API. However, the goal of Keras is for the creation and training of neural networks, rather than machine learning models in general. ANNs represent a large class of machine learning algorithms, and they are so called because their architecture resembles the neurons in the human brain. The Keras library has many general-purpose functions built in, such as optimizers, activation functions, and layer properties, so that users, like in scikit-learn, do not have to code these algorithms from scratch. Let us understand the following concepts:
Application of Keras and scikit-learn
scikit-learn
Estimators in scikit-learn
Keras is designed to be a high-level neural network API that is built on top of frameworks such as TensorFlow, CNTK, or Theano. One of the great benefits of using Keras as an introduction to deep learning for beginners is that it is very user friendly – advanced functions such as optimizers and layers are already built into the library and do not have to be written from scratch. Therefore, Keras is popular not only amongst beginners, but also seasoned experts. Also, the library allows rapid prototyping of neural networks, supports a wide variety of network architectures, and can be run on both CPU and GPU. Let us understand the following concepts:
Keras for Machine Learning
Advantages of Keras
Disadvantages of Keras
More than Building Models
In this video, we will begin fitting our model to the datasets that we have created. We will review the minimum steps required to create a machine learning model that can be applied to building models with any machine learning library, including scikit-learn and Keras. Let us understand the following concepts:
Classifiers and Regression Models
Classification Tasks
Regression Tasks
Training and Test Datasets
Model Evaluation Metrics
In this video, we will create a simple logistic regression model from the scikit-learn package. We will then create some model evaluation metrics and test the predictions against those model evaluation metrics.
In this topic, we will delve further into evaluating model performance and examine techniques of generalizing models to new data using regularization. Providing the context of a model's performance is extremely important. Our aim is to determine whether our model is performing well compared to trivial or obvious approaches. We do this by creating a baseline model against which machine learning models we train are compared. It is important to stress that all model evaluation metrics are evaluated and reported via the test dataset, since that will give us an understanding of how the model will perform on new data. Let us understand the following concepts:
Baseline Models
Determining a Baseline Model
We learned earlier about overfitting and what it looks like. The hallmark of overfitting is when a model is trained to the training data and performs extremely well yet performs terribly on test data. One reason for this could be that the model may be relying too heavily on certain features that lead to good performance in the training dataset but do not generalize well to new observations of data or the test dataset. One technique of avoiding this is called regularization. Regularization constrains the values of the coefficients toward zero, which discourages a complex model. There are many different types of regularization techniques.
Summarize your learning from this lesson.
Let us begin with the second lesson and understand what we are going to cover in our learning journey.
In this video, we will continue learning how to build machine learning models and extend our knowledge to build an Artificial Neural Network (ANN) with the Keras package. Let us learn more about it with the following topics:
Advantages of ANN over Traditional Machine Learning Algorithms
Advantages of Traditional Machine Learning Algorithms over ANN
Hierarchical Data Representation
Companies using ANNs
In this video, we will introduce linear transformations. Linear transformations are the backbone of modeling with ANNs. In fact, all the processes of ANN modeling can be thought of as a series of linear transformations. The working components of linear transformations are scalars, vectors, matrices, and tensors. Operations such as additions, transpositions, and multiplications are performed on these components. Let us learn more about it with the following topics:
Scalars, Vectors, Matrices, and Tensors
Tensor Addition
Perform Various Operations with Vectors, Matrices, and Tensors
Reshaping
The transpose of a matrix is an operator that flips the matrix over its diagonal. When this occurs, the rows become the columns and vice versa. Let’s look at it in further detail. Let us learn more about it with the following topics:
Matrix Reshaping and Transposition
Matrix Multiplication
Tensor Multiplication
Building ANNs involves creating layers of nodes. Each node can be thought of as a tensor of weights that are learned in the training process. Once the ANN is fitted to the data, a prediction is made by multiplying the input data by the weight matrices layer by layer, applying any other linear transformation when needed, such as activation functions, until the final output layer is reached. The size of each weight tensor is determined by the size of the shape of input nodes and the shape of the output nodes. Let us learn more about it with the following topics:
Single-Layer ANN
Keras Sequential Model
Keras Layer Types
Activation Functions
Model Compilation
Model Fitting
Model Evaluation
Summarize your learning from this lesson.
Let us begin with the third lesson and understand what we are going to cover in our learning journey.
In this video, you will first learn about the representations and concepts of deep learning such as forward propagation, backpropagation, and gradient descent. We will not delve deeply into these concepts, as it isn't required for this course. However, the coverage will essentially help anyone who wants to apply deep learning to a problem. Let us learn more about it with the following topics:
Deep Learning
Logistic Regression to Deep Neural Networks
Shallow vs Deep Neural Networks
Activation Functions
Forward Propagation for Making Predictions
Loss Function
Backpropagation for Computing Derivatives
In this video, we will learn how a deep learning model learns its optimal parameters. In other words, we are going to learn about how model parameters keep updating until the values for which the error rate or loss is minimized are found. This process is called learning parameters, and it is done using an optimization algorithm. One very common optimization algorithm used for learning parameters in machine learning is gradient descent. Let's see how gradient descent works.
In this video, we will move on to multi-layer or deep neural networks while learning about techniques for assessing the performance of a model. As you may have already realized, there are many hyperparameter choices to be made when building a deep neural network. Some very important challenges of applied deep learning are how to find the right values for the number of hidden layers, the number of units in each hidden layer, the type of activation function to use for each layer, the type of optimizer and loss function for training the network, among others. Model evaluation is required for making these decisions. By performing model evaluation, you can say whether a specific deep architecture or a specific set of hyperparameters is working poorly or well on a dataset, and therefore decide whether to change them or not. Let us learn more about it with the following topics:
Evaluating a Trained Model with Keras
Splitting Data into Training and Test Sets
Underfitting and Overfitting
Early Stopping
Summarize your learning from this lesson.
Let us begin with the fourth lesson and understand what we are going to cover in our learning journey.
Cross-validation is one of the most important and the most commonly used resampling methods. It computes the best estimation of model performance on new, unseen examples given a limited dataset. We will also explore the basics of cross-validation, its two different variations, and a comparison between them. Let us learn more about it with the following topics:
Drawbacks of Splitting Dataset Only Once
K-Fold Cross Validation
Leave-One-Out Cross Validation
Comparing K-Fold and LOO Methods
In this video, you will learn about using the Keras wrapper with scikit-learn, a very helpful tool that allows us to use Keras models as part of a scikit-learn workflow. As a result, scikit-learn methods and functions, such as the one for performing cross-validation, can be easily applied to Keras models. You will learn, step-by-step, how to implement what you learned about cross-validation in the previous section using scikit-learn. Furthermore, you will learn to use cross-validation in order to evaluate Keras deep learning models using the Keras wrapper with scikit-learn. Lastly, you will practice what you learned on a problem involving a real dataset. Let us learn more about it with the following topics:
Keras Wrapper with scikit-learn
Building the Keras Wrapper with scikit-learn for a Regression Problem
Cross-Validation with scikit-learn
Scikit-Learn Cross Validation Iterators
In this video, we will bring together all the concepts and methods that we learned in this topic about cross-validation. We will go through all the steps one more time, from defining a Keras deep learning model to transferring it to scikit-learn workflow and performing cross-validation in order to evaluate its performance.
Cross-validation provides us with a robust estimation of model performance on unseen examples. For this reason, it can be used to decide between two models for a problem or to decide which model parameters to use for a problem. In these cases, we would like to find out which model or which set of models parameters/hyperparameters results in the lowest test error rate. Therefore, we will select that model or that set of parameters/hyperparameters for our problem. In this video, you are going to practice using cross-validation for this purpose. You will learn how to define a set of hyperparameters for your deep learning model and then write user-defined functions in order to perform cross-validation on your model for each of the possible combinations of hyperparameters. You will then observe which combination of hyperparameters leads to the lowest test error rate, and that combination will be your choice for your final model.
In this video, you will learn how to use cross-validation for the purpose of model selection.
Summarize your learning from this lesson.
Let us begin with the fifth lesson and understand what we are going to cover in our learning journey.
Since deep neural networks are highly flexible models, overfitting is an issue that can often arise when training them. Therefore, one very important part of becoming a deep learning expert is knowing how to detect overfitting, and subsequently how to address the overfitting problem in your model. Regularization techniques are an important group of methods specifically aimed at reducing overfitting in machine learning models. Understanding regularization techniques thoroughly and being able to apply them to your deep neural networks is an essential step toward building deep neural networks in order to solve real-life problems. In this video, you will learn about the underlying concepts of regularization, providing you with the foundation required for the following sections, where you will learn how to implement various types of regularization methods using Keras. Let us learn more about it with the following topics:
The Need for Regularization
Reducing Overfitting with Regularization
The most common type of regularization for deep learning models is the one that keeps the weights of the network small. This type of regularization is called weight regularization and has two different variations: L2 regularization and L1 regularization. In this video, you will learn about these regularization methods in detail, along with how to implement them in Keras. Additionally, you will practice applying them to real-life problems and observe how they can improve the performance of a model. Let us learn more about it with the following topics:
L1 and L2 Regularization Formulation
L1 and L2 Regularization Implementation in Keras
In this video, you will learn about how dropout regularization works, how it helps with reducing overfitting, and how to implement it using Keras. Lastly, you will have the chance to practice what you have learned about dropout by completing an activity involving a real-life dataset. Let us learn more about it with the following topics:
Principles of Dropout Regularization
Reducing Overfitting with Dropout
Dropout Implementation in Keras
In this video, you will learn briefly about some other regularization techniques that are commonly used and have been shown to be effective in deep learning. It is important to keep in mind that regularization is a wide-ranging and active research field in machine learning. As a result, covering all available regularization methods in one lesson is not possible. Therefore, in this video, we will briefly cover three more regularization methods, called early stopping, data augmentation, and adding noise. You will learn briefly about their underlying ideas, and you'll gain a few tips and recommendations on how to use them. Let us learn more about it with the following topics:
Early Stopping Regularization
Implementing Early Stopping in Keras
Data augmentation is a regularization technique that tries to address overfitting by training the model on more training examples in an inexpensive way. In data augmentation, the available data is transformed in different ways and then fed to the model as new training data. This type of regularization has been shown to be effective, especially for some specific applications, such as object detection/ recognition in computer vision and speech processing. For example, in computer vision applications, you can simply double or triple the size of your training dataset by adding mirrored versions and rotated versions of each image to the dataset. The new training examples generated by these transformations are obviously not as good as original training examples. However, they are shown to improve the model in terms of overfitting.
Hyperparameter tuning is a very important technique for improving the performance of deep learning models. In Lesson 4, Evaluating your Model with Cross Validation with Keras Wrappers, you learned about using a Keras wrapper with scikit-learn, which allows for Keras models to be used in a scikit-learn workflow. Let us learn more about it with the following topics:
Grid Search with scikit-learn
Randomized Search with Scikit-Learn
Summarize your learning from this lesson.
Let us begin with the sixth lesson and understand what we are going to cover in our learning journey.
To understand accuracy properly, first let's explore model evaluation. Model evaluation is an integral part of the model development process. Once you build your model and execute it, the next step is to evaluate your model. A model is built on a training dataset and evaluating a model's performance on the same training dataset is a bad practice in data science. Once a model is trained on a training dataset, it should be evaluated on a dataset that is completely different from the training dataset. This dataset is known as the test dataset. The objective should always be to build a model that generalizes, which means the model should produce similar (but not the same) results, or relatively similar results, on any dataset. This can only be achieved if we evaluate the model on data that is unknown to it. Let us learn more about it with the following topics:
Accuracy
Null Accuracy
Calculating Null Accuracy on a Dummy Healthcare Dataset
Advantages and Limitations of Accuracy
Imbalanced datasets are a distinct case for classification problems where the class distribution varies between the classes. In such datasets, one class is overwhelmingly dominant. In other words, the null accuracy of an imbalanced dataset is very high. Let us learn more about it with the following topics:
L1 and L2 Regularization Formulation
L1 and L2 Regularization Implementation in Keras
A confusion matrix describes the performance of the classification model. In other words, confusion matrix is a way to summarize classifier performance. Let us learn more about it with the following topics:
Type 1 and Type 2 Error
Metrics Computed from a Confusion Matrix
Sensitivity
Specificity
Precision
False Positive Rate (FPR)
Receiver Operating Characteristic (ROC)
Area Under Curve
In this video, you will learn how to demonstrate computing and null accuracy with healthcare data.
In this video, we will learn how to calculate the ROC and AUC curves.
Summarize your learning from this lesson.
Let us begin with the seventh lesson and understand what we are going to cover in our learning journey.
To understand computer vision, let's first understand what human vision is. Human vision is the ability of the human eye and brain to see and recognize objects. Computer vision is the process of giving a machine a similar, if not better, understanding of seeing and identifying objects in the real world. It is simple for a human eye to precisely identify whether an animal is a tiger or a lion. But it takes a lot of training for a computer system to understand such objects distinctly. Computer vision can also be defined as building mathematical models that can mimic the function of a human eye and brain. Basically, it is about training computers to understand and process images and videos. Let us learn more about it with the following topics:
Convolution Neural Network
Example of Convolution Neural Network
The main components of a CNN architecture are input image, convolutional layer, pooling layer, and flattening. Let us learn more about it with the following topics:
Input Image
Convolution Layer
Feature Detector
Feature Map
Multiple Feature Maps
Pooling Layer
Max Pooling
Flattening
The word augmentation means the action or process of making or becoming greater in size or amount. Image or data augmentation works in a similar manner. Image/data augmentation creates many batches of our images. Then, it applies random transformations on random images inside the batches. Data transformation can be rotating images, shifting them, flipping them, and so on. By applying this transformation, we get more diverse images inside the batches, and we also have much more data than we had originally. Let us learn more about it with the following topics:
Advantages of Image Augmentation
Steps to Modify a Simple CNN
Build a CNN and Identify Images of Cats and Dogs
In this video, we will learn how to amend our model by reverting to the sigmoid activation function.
In this video, we will amend the model again by changing the optimizer to SGD.
In this exercise, we will try to classify a new image. The image is not exposed to the algorithm, so this will be the test of our algorithm. You can run any of the algorithms in this lesson, and then use the model to classify the image.
Summarize your learning from this lesson.
Let us begin with the eighth lesson and understand what we are going to cover in our learning journey.
In this video, we will learn about pre-trained sets and transfer of learning. Let us learn more about it with the following topics:
Pre-Trained Sets
Transfer Learning
Feature Extraction
Freezing Convolutional layers
Fine-tuning means tweaking our neural network in such a way that it becomes more relevant to the task at hand. We can freeze some of the initial layers of the network so that we don't lose information stored in those layers. The information stored there is generic and of useful. However, if we can freeze those layers while our classifier is learning and then unfreeze them, we can tweak them a little so that they fit even better to the problem at hand. Suppose we have a pre-trained network that identifies animals. However, if we want to identify specific animals, such as dogs and cats, then we can tweak the layers a little bit so that they can learn how dogs and cats look. This is like using the whole pre-trained network and then adding a new layer that consists of images of dogs and cats. Let us learn more about it with the following topics:
The ImageNet Dataset
Some Pre-Trained Networks in Keras
Identify an Image Using the VGG16 Network to Identify Images
In this video, we will learn how to classify images that are not present in the ImageNet Database.
In this video, we will freeze the network and remove the last layer of VGG16, which has 1,000 labels in it. After removing the last layer, we will build a new dog-cat classifier ANN, as done in Lesson 7, Computer Vision with Convolutional Neural Networks, and will connect this ANN to VGG16 instead of the original one with 1,000 labels. Essentially, what we will do is replace the last layer of VGG16 with a user-defined layer.
Finally, before closing this lesson, let's learn how to classify images with ResNet40 network.
Summarize your learning from this lesson.
You are given an image of a cat. Use the VGG16 network to predict the image.
Before you start, ensure that you have downloaded the image (test_image_1) to your working directory. To complete the activity, follow these steps:
1. Import the required libraries along with the VGG16 network.
2. Initiate the pre-trained VGG16 model.
3. Load the image that is to be classified.
4. Preprocess the image by applying the transformations.
5. Create a predictor variable to predict the image.
6. Label the image and classify it.
Let us begin with the ninth lesson and understand what we are going to cover in our learning journey.
In this video, we will learn about sequential memory and modeling. Let us learn more about it with the following topics:
Recurrent Neural Network
The Vanishing Gradient Problem
A Brief on Exploding Gradient
LSTMs are RNNs whose main objective is to overcome the shortcomings of the vanishing gradient and exploding gradient problem. The architecture is built such that they remember data and information for a long period of time.
We will examine the stock price of Apple over a period of 5 years; that is, from January 1, 2014 to December 31, 2018. In doing so, we will try to predict and forecast the company's future trend for January 2019 using RNNs.
We will examine the stock price of Apple over the last 5 years, from January 1, 2014 to December 31, 2018. In doing so, we will try to predict and forecast the company's future trend for January 2019 using RNNs. We have the actual values for January 2019, so we will compare our predictions with the actual values.
Summarize your learning from this lesson.
Though designing neural networks is a sought-after skill, it is not easy to master. With Keras, you can apply complex machine learning algorithms with minimum code.
Applied Deep Learning with Keras starts by taking you through the basics of machine learning and Python all the way to gaining an in-depth understanding of applying Keras to develop efficient deep learning solutions. To help you grasp the difference between machine and deep learning, the course guides you on how to build a logistic regression model, first with scikit-learn and then with Keras. You will delve into Keras and its many models by creating prediction models for various real-world scenarios, such as disease prediction and customer churning. You’ll gain knowledge on how to evaluate, optimize, and improve your models to achieve maximum information. Next, you’ll learn to evaluate your model by cross-validating it using Keras Wrapper and scikit-learn. Following this, you’ll proceed to understand how to apply L1, L2, and dropout regularization techniques to improve the accuracy of your model. To help maintain accuracy, you’ll get to grips with applying techniques including null accuracy, precision, and AUC-ROC score techniques for fine tuning your model.
By the end of this course, you will have the skills you need to use Keras when building high-level deep neural networks.
About the Author
Ritesh Bhagwat has a master's degree in applied mathematics with a specialization in computer science. He has over 14 years of experience in data-driven technologies and has led and been a part of complex projects ranging from data warehousing and business intelligence to machine learning and artificial intelligence. He has worked with top-tier global consulting firms as well as large multinational financial institutions. Currently, he works as a data scientist. Besides work, he enjoys playing and watching cricket and loves to travel. He is also deeply interested in Bayesian statistics.
Mahla Abdolahnejad is a Ph.D. candidate in systems and computer engineering with Carleton University, Canada. She also holds a bachelor's degree and a master's degree in biomedical engineering, which first exposed her to the field of artificial intelligence and artificial neural networks, in particular. Her Ph.D. research is focused on deep unsupervised learning for computer vision applications. She is particularly interested in exploring the differences between a human's way of learning from the visual world and a machine's way of learning from the visual world, and how to push machine learning algorithms toward learning and thinking like humans.
Matthew Moocarme is a director and senior data scientist in Viacom’s Advertising Science team. As a data scientist at Viacom, he designs data-driven solutions to help Viacom gain insights, streamline workflows, and solve complex problems using data science and machine learning.
Matthew lives in New York City and outside of work enjoys combining deep learning with music theory. He is a classically-trained physicist, holding a Ph.D. in Physics from The Graduate Center of CUNY and is an active Artificial Intelligence developer, researcher, practitioner, and educator.