Do you want to build complex deep learning models in Keras? Do you want to use neural networks for classifying images, predicting prices, and classifying samples in several categories?
Keras is the most powerful library for building neural networks models in Python. In this course we review the central techniques in Keras, with many real life examples. We focus on the practical computational implementations, and we avoid using any math.
The student is required to be familiar with Python, and machine learning; Some general knowledge on statistics and probability is recommended, but not strictly necessary.
Among the many examples presented here, we use neural networks to tag images belonging to the River Thames, or the street; to classify edible and poisonous mushrooms, to predict the sales of several video games for multiple regions, to identify bolts and nuts in images, etc.
We use most of our examples on Windows, but we show how to set up an AWS machine, and run our examples there. In terms of the course curriculum, we cover most of what Keras can actually do: such as the Sequential model, the model API, Convolutional neural nets, LSTM nets, etc. We also show how to actually bypass Keras, and build the models directly in Theano/Tensorflow syntax (although this is quite complex!)
After taking this course, you should feel comfortable building neural nets for time sequences, images classification, pure classification and/or regression. All the lectures here can be downloaded and come with the corresponding material.
We explain how to install Keras and Theano and we explain the basics behind Keras. If you want to use Tensorflow instead of Theano, a very similar approach is used.
We show some basic symbolic code in Theano which is useful for explaining what Keras will do when we build a model. In fact Keras, will use Theano/Tensorflow to do all the tensor operations necessary for the neural network that we build in Keras.
Running complex neural networks on our machines is sometimes not feasible due to either memory or speed requirements. AWS (Amazon Web Services) provide a cheap and scalable solution, specially because there are existing images that we can use (which contain all the necessary software - Python - Keras - Cuda) simplifying the installation process. We show how to create an instance on AWS, how to run code there, and how to upload and download files
Keras provides two ways of constructing models: The Sequential approach and the Model API. We introduce the Sequential approach. It allows to construct models by easily stacking layers together
Every layer of a neural network works by multiplying the different weights by the inputs/neurons and after each sum is computed, an activation function is applied. In general, these activations are nonlinear and we can choose among several ones in Keras: sigmoid, elu, relu, tanh, etc.
We explain the different layers that we have in Keras. The most fundamental one is the Dense() layer which is a fully connected layer. But there are certainly other very important ones. We review the most important ones.
We explain how to train a model in Keras
Loss functions are used in Keras to compute the final loss for our models (how well our model is performing?). Keras minimizes these loss functions by using special algorithms. Of course, the loss functions depend on which specific problem we are trying to solve. We need specific loss functions for classification problems, other ones for regression problems, etc.
Overfitting occurs when our model tends to fit too much to our data. The problem is that when this happens, the model will perform very badly in an out-of-sample scenario. Remember that we use our data to fit a model, and we then use that model to make predictions for real (out of sample) observations.
We use a real dataset containing information about several wines; in particular we have different chemical measurements about them. And we want to classify these wines into each one of three categories. We finally achieve an excellent accuracy using a neural network with several layers. This is a good example to introduce the categorical cross entropy loss, which is designed to tackle multi-label classification problems.
We use a real dataset containing information about different mushrooms. We want to predict whether they are edible or not. We use a neural network with a binary cross entropy loss, because we have just two categories. We achieve an excellent in-sample accuracy.
In this case, we want to predict the house prices for a particular county in the US. This is our first example of neural networks used for a regression problem (when the variable that we want to predict is numeric). In this case, we naturally need to use a different loss function: we can choose among the mse, mae, and several other ones.
We explain how SGD works. We discuss how the learning rate affects the results, and how does the minimzation algorithm that Keras uses works
We explain how backpropagation works. It is the fundamental technique used for training neural networks. And we review how the inner math works (how the chain rule is used). This is the most technical lecture of this course.
There are several optimizers that can be used in Keras. All of them, are variations of the stochastic gradient descent. We explain the two general parameters that can be used with all of them.
We explain the basics behind the different optimizers in Keras. And we show how to tweak the different parameters that each optimizer has.
We show how to pull the different weights from each layer. This is particularly useful when we want to understand what each layer has inside, which is very relevant when our model is not being trained properly.
Sometimes, due to the size of the data, it might not be possible to fit everything into a single Keras model. But, what we can do, is to feed our Keras model with several batches of data. We use this for predicting car prices in Ebay- Germany.
Keras' wonderful model api allows us to define very complex architectures. In this case, we use it to merge several layers.
We use a multi-output model to predict the sales for video games for North America, Europe and Japan. We do this using the model API.
We show how to wrap Keras inside Scikit-learn to compare different models using cross validation. This is particularly relevant for neural networks, because they tend to overfit. So comparing different models is not feasible using the very same dataset that we used for training. Cross validation provides an elegant solution to this.
We show how to wrap Keras inside Scikit-learn to identify the best parameters via cross-validation. This is a robust way of identifying the appropriate epoch value, batch size, etc. Cross validation (and in particular k fold cross validation) uses every observation for both training and testing, so it is a good idea to use, specially when your sample is rather small. In particular, we use GridSearchCV, which constructs a grid containing different parameter values: we even use it to identify what's the optimal amount of neurons in a hidden layer!
Images are used frequently in machine learning, both for deep neural networks and for traditional algorithms (SVM, random forests, etc). We review the basics behind image loading and we present a class that can be used to read an entire directory and build the proper matrices needed for doing machine learning. This class is useful for transforming images in RGB channels (3 tensors) into black and white (0,1) matrices. It should only be used when reading images already in black and white format
We present a similar class, but now it is designed to accommodate 3 channel image data (RGB Images), which we typically need to treat as a 5-dim tensor. This class will be useful for doing convolutional neural nets in the next section
We introduce the multilayer perceptron neural network. It is a feedforward network using non-linear activation functions.In its simplest case, with only one hidden layer, it is called a "logistic regression" model
We actually code a multilayer perceptron in pure Theano (doing the appropriate Tensor operations). In fact this is what Keras is doing for us, when we code an MLP network. In general, for other network configurations, Keras does a very similar thing: it builds the appropriate code in Theano/Tensorflow. This lecture is rather technical, so it's only necessary if you want to understand the inner workings of Keras.
We continue with our previous lecture, coding a pure MLP neural network in Theano doing the Tensor operations
We code a Multilayer Perceptron Network in Keras. It builds exactly the same structure that we used in the two previous lectures. We use this network for classifying shapes in drawings: squares and triangles. We achieve an excellent (100% percent) accuracy.
Introduction to Convolutional Neural Networks
How do Convolutions and Max-Pooling work? What are necessary dimensions for a 2d-Convolution and what are the dimensions of its output?
Using convolutional neural networks to predict different hand gestures
We use a very similar framework to identify nuts and bolts using images containing these pieces on a wooden desk
We show how to use neural networks to classify images from the river Thames vs images taken from the streets. This is similar to how many automatic tagging technologies work (software that tells you if your image was taken at the beach, or park, or in the mountains). We achieve a 100% accuracy both in in-sample and out-of-sample scenarios
Brief introduction to recurrent neural networks
Backpropagation uses the chain rule from calculus to compute the partial derivatives of the loss function with respect to the weights. This has an undesired consequence, when we have multiple layers and we need to do many multiplications, it can well happen that the gradient fades to zero. The practical consequence is that the training for the initial layers' weights can take just too long, because the gradient is not properly propagated. This is particularly relevant for recurrent neural networks, as they reuse previous layers (from previous time periods).
We introduce the lstm model, which solves the vanishing gradient problem by intelligently reformulating the neural network model. It uses gates which are used for forgetting information, adding new information, and mixing new information with the information from previous periods. We use this model to predict house prices in London using AWS (Amazon Web Services)
I worked for 7+ years exp as statistical programmer in the industry. Expert in programming, statistics, data science, statistical algorithms. I have wide experience in many programming languages. Regular contributor to the R community, with 3 published packages. I also am expert SAS programmer. Contributor to scientific statistical journals. Latest publication on the Journal of Statistical Software.