Please confirm that you want to add Zero to Deep Learning™ with Python and Keras to your Wishlist.
This course is designed to provide a complete introduction to Deep Learning. It is aimed at beginners and intermediate programmers and data scientists who are familiar with Python and want to understand and apply Deep Learning techniques to a variety of problems.
We start with a review of Deep Learning applications and a recap of Machine Learning tools and techniques. Then we introduce Artificial Neural Networks and explain how they are trained to solve Regression and Classification problems.
Over the rest of the course we introduce and explain several architectures including Fully Connected, Convolutional and Recurrent Neural Networks, and for each of these we explain both the theory and give plenty of example applications.
This course is a good balance between theory and practice. We don't shy away from explaining mathematical details and at the same time we provide exercises and sample code to apply what you've just learned.
The goal is to provide students with a strong foundation, not just theory, not just scripting, but both. At the end of the course you'll be able to recognize which problems can be solved with Deep Learning, you'll be able to design and train a variety of Neural Network models and you'll be able to use cloud computing to speed up training and improve your model's performance.
Welcome to the course!
This is a hands-on course where you learn to train deep learning models. Deep learning models are used in real world applications to power technologies such as language translation and object recognition.
Lets get our development environment ready. Let's install Anaconda python and additional python packages you will need in order to follow the course.
Let's get the source code that we will use during the course.
Running your first model will help us check that you have installed all the material correctly.
First of all let's establish a common vocabulary and introduce some common terms that will be used throughout the course
Descriptive statistics and a few simple checks can be very useful to formulate an initial intuition about the data.
Plotting is a powerful way to explore the data and different kinds of plots are useful in different situations.
Let's show an example of plotting with Matplotlib!
Most often than not data is not just tabular. Deep learning can handle text documents, images, sound, and even binary data.
Often Deep Learning uses Image or Audio data, let's see how we can work with it in the Jupyter Environment!
Feature engineering is the process through which we can transform an unstructured datapoint to a structured, tabular record.
In this exercise you will load and plot a dataset, exploring it visually to gather some insights and also to familiarize with python's plotting library: Matplotlib.
Let's continue working through and explaining the solutions!
Let's continue working through and explaining the solutions!
Let's continue working through and explaining the solutions!
Let's continue working through and explaining the solutions!
There are several types of machine learning, including supervised learning, unsupervised learning, reinforcement learning etc. This course focuses primarily on Supervised Learning.
Supervised learning allows computers to learn patterns from examples. It is used in several domains and applications and here you learn to identify problems that can be solved using it.
The easiest example of supervised learning is Linear Regression. LR looks for a functional relation between input and output variables.
In order to find the best possible linear model to describe our data, we need to define a criterion to evaluate the "goodness" of a particular model. This is the role of the cost function.
Let's begin to work through the notebook example for the cost function!
Now that we have both a hypothesis (linear model) and a cost function (mean squared error), we need to find the combination of parameters that minimizes such cost.
Let's play with Keras to create a Linear Regression Model!
How can we know if the model we just trained is good? Since the purpose of our model is to learn to generalize from examples let's test how the model performs on a new set of data not used for training.
Let's code through an example of evaluating model performance!
Classification is a technique to use when the target variable is discrete, instead of continuous. Here we introduce similarities and differences from a regression.
Let's code through a classification example!
In some cases our model may seem to be performing really well on the training data, but poorly on the test data. This is called overfitting.
A more accurate way to assess the ability of our model to generalize to unseen datapoints is to repeat the train/test split procedure multiple times and then average the results. This is called cross-validation.
Let's code through some cross validation!
In a binary classification we can define several types of error and choose which one to reduce.
Sometimes we need to preprocess the features, for example if we have categorical data or if the scale is too big or too small.
Let's code through an example solution of the pre-processing problems!
Let's code through an example solution of the pre-processing problems!
Deep learning is successfully applied to many different domains. Here we review a few of them.
The perceptron is the simplest neural network and here we learn all about Nodes, Edges, Biases, Weights as well as the need for an Activation function
We can combine the output of a perceptron to the input of another one, stacking them into layers. A fully connected architecture is just a series of such layers. Forward propagation still applies.
Let's code through a NN example!
Let's learn how to work with multiple outputs!
Let's code through an example of multi-class classification!
The activation function is what makes neural networks so powerful. In this lecture we review several types of activation functions and understand why it is necessary.
A neural network formulates a prediction using "forward propagation". Here you will learn what it is.
Let's work through our Deep Learning Introduction exercises!
Let's work through our Deep Learning Introduction exercises!
Let's work through our Deep Learning Introduction exercises!
The Tensorflow playground is a nice web app that allows you to play around with simple neural network parameters to get a feel for what they do.
What is the gradient and why is it important? In this lecture we introduce the gradient in 1 dimension and then extend it to many dimensions.
The gradient is important because it allows us to know how to adjust the parameters of our model in order to find the best model. Here I will give you some intuition about it.
Let's quickly cover the Chain Rule that you'll need to understand!
How does backpropagation work when we have a more complex neural network? The chain rule of derivation is the answer. As we shall see this reduces to a lot of matrix multiplications.
The learning rate is the external parameter that we can control to decide the size of our updates to the weights.
How do we feed the data to our model in order to adjust the weights by gradient descent? The answer is in batches. In this lecture you will learn all about epochs, batches and mini-batches.
Let's briefly go over working with NumPy arrays!
The learning rate is an important parameter of your model, let's go over it!
Let's see how models can be effected using the learning rate
Gradient descent is a first-order iterative optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.
Let's code through an example of Gradient Descent!
Exponentially Weighted Moving Average is one of the most common algorithms used for smoothing!
Many improved optimization algorithms use the ewma filter. Here we review a few improvements to the naive backpropagation algorithm.
Let's code through some optimization algorithms that are using ewma.
Let's code through some initialization, assigning weights to the initial values of our model.
Let's visualize the inner layers of our network!
Let's work through the solutions for exercise 1!
Let's work through the solutions for exercise 2!
Let's work through the solutions for exercise 3!
Let's work through the solutions for exercise 4!
Tensorflow comes equipped with a small visualization server that allows us to display a bunch of things.
Images can be viewed as a sequence of pixels or we can extract ad hoc features from them. Both approaches offer advantages and limitations.
Let's work through this classic dataset to identify and classify hand written digits!
Nearby pixels are correlated and this can be exploited to build a more intelligent model.
In this lecture we introduce tensors as extensions of matrices and see how they are added and multiplied.
Let's work through some of the mathematics related to Tensors!
Let's explore 1 dimensional convolution!
Let's code through an example 1 dimensional convolution!
Let's explore 2 dimensional convolution!
What is the effect of convolving an image with a gaussian filter? Here we find out.
How are layers connected in a CNN. Here we look at weights, channels and feature maps.
Let's code through some convolutional layers examples
Max pooling and Average pooling layers are useful to reduce the size of our model, forcing it to focus on the most important features.
Let's code through an example of pooling layers!
Combine several pooling and convolutional layers and finally connect them to a prediction fully connected layer.
Let's code through a CNN example!
Compare the parameter count and the performance of convolutional and fully connected architectures.
CNNs are not just useful when dealing with images. We can use them to classify other data such as sound and text. Convolutional architectures are useless when there is no correlation between nearby rows and columns, for example with tabular data
Set up a classifier to classify images (hot or not, cat or dog etc.), realize training is too slow and a GPU is needed.
Let's work through another exercise solution!
Let's work through an example of setting up our notebook on Floydhub!
If you have never dealt with time-series, this lecture reviews a few concepts like rolling windows, feature extraction and validation on time series.
We introduce several sequence-specific problems including one to one, one to many and many to many and show practical cases of where they are encountered.
Here we introduce the simplest recurrent neural network and explain how to expand the time dependence.
Recently introduced, GRUs solve the vanishing gradient problem and allow for an effective implementation of recurrent neural networks.
Learning curves are a useful tool to answer the question: do we need more data or a better algorithm? The performance of a large neural network keeps improving the more data we throw at it.
One technique to speed up training is batch normalization.
Another technique to improve convergence of a network is to make it more robust to internal failure.
Let's code through a dropout example!
In some cases, more data can be obtained by slightly modifying the existing training data. For example, applying noise to sound or distortions to an image.
In some cases we can continuously generate new data to feed to deep learning model.
Let's create an image generator!
Let's show how we can search for optimal network architecture
Sometimes we can represent data in a better way before feeding it to a model.
Let's work through an image recognition system!
Let's work through the second exercise solution!
Data Weekends™ are accelerated data science workshop for programmers where you can quickly learn to apply predictive analytics to real-world data. We offer courses in Data Analytics, Machine Learning, Deep Learning and Reinforcement Learning.
Through our parent company Catalit LLC we also offer corporate training and consulting on Data Science, Machine Learning and Deep Learning.
Data Weekends' founder and lead instructor is Francesco Mosconi, PhD.
Jose Marcial Portilla has a BS and MS in Mechanical Engineering from Santa Clara University and years of experience as a professional instructor and trainer for Data Science and programming. He has publications and patents in various fields such as microfluidics, materials science, and data science technologies. Over the course of his career he has developed a skill set in analyzing data and he hopes to use his experience in teaching and data science to help other people learn the power of programming the ability to analyze data, as well as present the data in clear and beautiful visualizations. Currently he works as the Head of Data Science for Pierian Data Inc. and provides in-person data science and python programming training courses to employees working at top companies, including General Electric, Cigna, The New York Times, Credit Suisse, and many more. Feel free to contact him on LinkedIn for more information on in-person training sessions.
Francesco is a Data Science consultant and trainer. With Catalit LLC he helps companies acquire skills and knowledge in data science and
harness the power of machine learning and deep learning to reach their
goals
Before Data Weekends, Francesco served as lead instructor in Data Science at General Assembly and The Data Incubator and he was Chief Data Officer and co-founder at Spire, a YCombinator-backed startup company that invented the first consumer wearable device capable of continuously tracking respiration and activity.
He earned a joint PhD in biophysics at University of Padua and Université de Paris VI and is also a graduate of Singularity University summer program of 2011.