Zero to Deep Learning™ with Python and Keras
4.5 (352 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
4,235 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Zero to Deep Learning™ with Python and Keras to your Wishlist.

Add to Wishlist

Zero to Deep Learning™ with Python and Keras

Understand and build Deep Learning models for images, text, sound and more using Python and Keras
Best Seller
4.5 (352 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
4,235 students enrolled
Last updated 5/2017
English [Auto-generated]
Curiosity Sale
Current price: $10 Original price: $200 Discount: 95% off
30-Day Money-Back Guarantee
  • 9.5 hours on-demand video
  • 7 Articles
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • To describe what Deep Learning is in a simple yet accurate way
  • To explain how deep learning can be used to build predictive models
  • To distinguish which practical applications can benefit from deep learning
  • To install and use Python and Keras to build deep learning models
  • To apply deep learning to solve supervised and unsupervised learning problems involving images, text, sound, time series and tabular data.
  • To build, train and use fully connected, convolutional and recurrent neural networks
  • To look at the internals of a deep learning model without intimidation and with the ability to tweak its parameters
  • To train and run models in the cloud using a GPU
  • To estimate training costs for large models
  • To re-use pre-trained models to shortcut training time and cost (transfer learning)
View Curriculum
  • Knowledge of Python, familiarity with control flow (if/else, for loops) and pythonic constructs (functions, classes, iterables, generators)
  • Use of bash shell (or equivalent command prompt) and basic commands to copy and move files
  • Basic knowledge of linear algebra (what is a vector, what is a matrix, how to calculate dot product)
  • Use of ssh to connect to a cloud computer

This course is designed to provide a complete introduction to Deep Learning. It is aimed at beginners and intermediate programmers and data scientists who are familiar with Python and want to understand and apply Deep Learning techniques to a variety of problems.

We start with a review of Deep Learning applications and a recap of Machine Learning tools and techniques. Then we introduce Artificial Neural Networks and explain how they are trained to solve Regression and Classification problems.

Over the rest of the course we introduce and explain several architectures including Fully Connected, Convolutional and Recurrent Neural Networks, and for each of these we explain both the theory and give plenty of example applications.

This course is a good balance between theory and practice. We don't shy away from explaining mathematical details and at the same time we provide exercises and sample code to apply what you've just learned.

The goal is to provide students with a strong foundation, not just theory, not just scripting, but both. At the end of the course you'll be able to recognize which problems can be solved with Deep Learning, you'll be able to design and train a variety of Neural Network models and you'll be able to use cloud computing to speed up training and improve your model's performance.

Who is the target audience?
  • Software engineers who are curious about data science and about the Deep Learning buzz and want to get a better understanding of it
  • Data scientists who are familiar with Machine Learning and want to develop a strong foundational knowledge of deep learning
Students Who Viewed This Course Also Viewed
Curriculum For This Course
133 Lectures
Welcome to the course!
7 Lectures 46:13

Welcome to the course!


This is a hands-on course where you learn to train deep learning models. Deep learning models are used in real world applications to power technologies such as language translation and object recognition.

Preview 09:29

Lets get our development environment ready. Let's install Anaconda python and additional python packages you will need in order to follow the course.

Preview 03:06

Installation Video Guide

Let's get the source code that we will use during the course.

Obtain the code for the course

Course Folder Walkthrough

Running your first model will help us check that you have installed all the material correctly.

Your first deep learning model
17 Lectures 01:03:58

First of all let's establish a common vocabulary and introduce some common terms that will be used throughout the course

Tabular data

Descriptive statistics and a few simple checks can be very useful to formulate an initial intuition about the data.

Data exploration with Pandas code along

Plotting is a powerful way to explore the data and different kinds of plots are useful in different situations.

Visual data Exploration

Let's show an example of plotting with Matplotlib!

Plotting with Matplotlib

Most often than not data is not just tabular. Deep learning can handle text documents, images, sound, and even binary data.

Unstructured Data

Often Deep Learning uses Image or Audio data, let's see how we can work with it in the Jupyter Environment!

Images and Sound in Jupyter

Feature engineering is the process through which we can transform an unstructured datapoint to a structured, tabular record.

Feature Engineering

Exercise 1 Presentation

In this exercise you will load and plot a dataset, exploring it visually to gather some insights and also to familiarize with python's plotting library: Matplotlib.

Exercise 1 Solution

Exercise 2 Presentation

Let's continue working through and explaining the solutions!

Exercise 2 Solution

Exercise 3 Presentation

Let's continue working through and explaining the solutions!

Exercise 3 Solution

Exercise 4 Presentation

Let's continue working through and explaining the solutions!

Exercise 4 Solution

Exercise 5 Presentation

Let's continue working through and explaining the solutions!

Exercise 5 Solution
Machine Learning
21 Lectures 02:02:26

There are several types of machine learning, including supervised learning, unsupervised learning, reinforcement learning etc. This course focuses primarily on Supervised Learning.

Machine Learning Problems

Supervised learning allows computers to learn patterns from examples. It is used in several domains and applications and here you learn to identify problems that can be solved using it.

Supervised Learning

The easiest example of supervised learning is Linear Regression. LR looks for a functional relation between input and output variables.

Linear Regression

In order to find the best possible linear model to describe our data, we need to define a criterion to evaluate the "goodness" of a particular model. This is the role of the cost function.

Cost Function

Let's begin to work through the notebook example for the cost function!

Cost Function code along

Now that we have both a hypothesis (linear model) and a cost function (mean squared error), we need to find the combination of parameters that minimizes such cost.

Finding the best model

Let's play with Keras to create a Linear Regression Model!

Linear Regression code along

How can we know if the model we just trained is good? Since the purpose of our model is to learn to generalize from examples let's test how the model performs on a new set of data not used for training.

Evaluating Performance

Let's code through an example of evaluating model performance!

Evaluating Performance code along

Classification is a technique to use when the target variable is discrete, instead of continuous. Here we introduce similarities and differences from a regression.


Let's code through a classification example!

Classification code along

In some cases our model may seem to be performing really well on the training data, but poorly on the test data. This is called overfitting.


A more accurate way to assess the ability of our model to generalize to unseen datapoints is to repeat the train/test split procedure multiple times and then average the results. This is called cross-validation.

Cross Validation

Let's code through some cross validation!

Cross Validation code along

Confusion matrix

In a binary classification we can define several types of error and choose which one to reduce.

Confusion Matrix code along

Sometimes we need to preprocess the features, for example if we have categorical data or if the scale is too big or too small.

Feature Preprocessing code along

Exercise 1 Presentation

Let's code through an example solution of the pre-processing problems!

Exercise 1 solution

Exercise 2 Presentation

Let's code through an example solution of the pre-processing problems!

Exercise 2 solution
Deep Learning Intro
16 Lectures 01:15:16

Deep learning is successfully applied to many different domains. Here we review a few of them.

Deep Learning successes

The perceptron is the simplest neural network and here we learn all about Nodes, Edges, Biases, Weights as well as the need for an Activation function

Neural Networks

We can combine the output of a perceptron to the input of another one, stacking them into layers. A fully connected architecture is just a series of such layers. Forward propagation still applies.

Deeper Networks

Let's code through a NN example!

Neural Networks code along

Let's learn how to work with multiple outputs!

Multiple Outputs

Let's code through an example of multi-class classification!

Multiclass classification code along

The activation function is what makes neural networks so powerful. In this lecture we review several types of activation functions and understand why it is necessary.

Activation Functions

A neural network formulates a prediction using "forward propagation". Here you will learn what it is.

Feed forward

Exercise 1 Presentation

Let's work through our Deep Learning Introduction exercises!

Exercise 1 Solution

Exercise 2 Presentation

Let's work through our Deep Learning Introduction exercises!

Exercise 2 Solution

Exercise 3 Presentation

Let's work through our Deep Learning Introduction exercises!

Exercise 3 Solution

Exercise 4 Presentation

The Tensorflow playground is a nice web app that allows you to play around with simple neural network parameters to get a feel for what they do.

Exercise 4 Solution
Gradient Descent
25 Lectures 01:43:52

What is the gradient and why is it important? In this lecture we introduce the gradient in 1 dimension and then extend it to many dimensions.

Derivatives and Gradient

The gradient is important because it allows us to know how to adjust the parameters of our model in order to find the best model. Here I will give you some intuition about it.

Backpropagation intuition

Let's quickly cover the Chain Rule that you'll need to understand!

Chain Rule

How does backpropagation work when we have a more complex neural network? The chain rule of derivation is the answer. As we shall see this reduces to a lot of matrix multiplications.

Derivative Calculation

The learning rate is the external parameter that we can control to decide the size of our updates to the weights.

Fully Connected Backpropagation

How do we feed the data to our model in order to adjust the weights by gradient descent? The answer is in batches. In this lecture you will learn all about epochs, batches and mini-batches.

Matrix Notation

Let's briefly go over working with NumPy arrays!

Numpy Arrays code along

The learning rate is an important parameter of your model, let's go over it!

Learning Rate

Let's see how models can be effected using the learning rate

Learning Rate code along

Gradient descent is a first-order iterative optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point.

Gradient Descent

Let's code through an example of Gradient Descent!

Gradient Descent code along

Exponentially Weighted Moving Average is one of the most common algorithms used for smoothing!


Many improved optimization algorithms use the ewma filter. Here we review a few improvements to the naive backpropagation algorithm.


Let's code through some optimization algorithms that are using ewma.

Optimizers code along

Let's code through some initialization, assigning weights to the initial values of our model.

Initialization code along

Let's visualize the inner layers of our network!

Inner Layers Visualization code along

Exercise 1 Presentation

Let's work through the solutions for exercise 1!

Exercise 1 Solution

Exercise 2 Presentation

Let's work through the solutions for exercise 2!

Exercise 2 Solution

Exercise 3 Presentation

Let's work through the solutions for exercise 3!

Exercise 3 Solution

Exercise 4 Presentation

Let's work through the solutions for exercise 4!

Exercise 4 Solution

Tensorflow comes equipped with a small visualization server that allows us to display a bunch of things.

Convolutional Neural Networks
20 Lectures 01:14:01

Images can be viewed as a sequence of pixels or we can extract ad hoc features from them. Both approaches offer advantages and limitations.

Features from Pixels

MNIST Classification

Let's work through this classic dataset to identify and classify hand written digits!

MNIST Classification code along

Nearby pixels are correlated and this can be exploited to build a more intelligent model.

Beyond Pixels

In this lecture we introduce tensors as extensions of matrices and see how they are added and multiplied.

Images as Tensors

Let's work through some of the mathematics related to Tensors!

Tensor Math code along

Let's explore 1 dimensional convolution!

Convolution in 1 D

Let's code through an example 1 dimensional convolution!

Convolution in 1 D code along

Let's explore 2 dimensional convolution!

Convolution in 2 D

What is the effect of convolving an image with a gaussian filter? Here we find out.

Image Filters code along

How are layers connected in a CNN. Here we look at weights, channels and feature maps.

Convolutional Layers

Let's code through some convolutional layers examples

Convolutional Layers code along

Max pooling and Average pooling layers are useful to reduce the size of our model, forcing it to focus on the most important features.

Pooling Layers

Let's code through an example of pooling layers!

Pooling Layers code along

Combine several pooling and convolutional layers and finally connect them to a prediction fully connected layer.

Convolutional Neural Networks

Let's code through a CNN example!

Convolutional Neural Networks code along

Compare the parameter count and the performance of convolutional and fully connected architectures.

Weights in CNNs

CNNs are not just useful when dealing with images. We can use them to classify other data such as sound and text. Convolutional architectures are useless when there is no correlation between nearby rows and columns, for example with tabular data

Beyond Images

Set up a classifier to classify images (hot or not, cat or dog etc.), realize training is too slow and a GPU is needed.

Exercise 1 Solution

Let's work through another exercise solution!

Exercise 2 Solution
Cloud GPUs
1 Lecture 00:45

Let's work through an example of setting up our notebook on Floydhub!

Floyd GPU notebook setup
Recurrent Neural Networks
11 Lectures 41:35

If you have never dealt with time-series, this lecture reviews a few concepts like rolling windows, feature extraction and validation on time series.

Time Series

We introduce several sequence-specific problems including one to one, one to many and many to many and show practical cases of where they are encountered.

Sequence problems

Here we introduce the simplest recurrent neural network and explain how to expand the time dependence.

Vanishing Gradients

Vanilla RNN

Recently introduced, GRUs solve the vanishing gradient problem and allow for an effective implementation of recurrent neural networks.


Time Series Forecasting code along

Time Series Forecasting with LSTM code along

Rolling Windows

Rolling Windows code along

Exercise 1 Solution

Exercise 2 Solution
Improving performance
15 Lectures 56:28

Learning curves are a useful tool to answer the question: do we need more data or a better algorithm? The performance of a large neural network keeps improving the more data we throw at it.

Learning curves

Learning curves code along

One technique to speed up training is batch normalization.

Batch Normalization

Batch Normalization code along

Another technique to improve convergence of a network is to make it more robust to internal failure.


Let's code through a dropout example!

Dropout and Regularization code along

In some cases, more data can be obtained by slightly modifying the existing training data. For example, applying noise to sound or distortions to an image.

Data Augmentation

In some cases we can continuously generate new data to feed to deep learning model.

Continuous Learning

Let's create an image generator!

Image Generator code along

Let's show how we can search for optimal network architecture

Hyperparameter search

Sometimes we can represent data in a better way before feeding it to a model.


Embeddings code along

Movies Reviews Sentiment Analysis code along

Let's work through an image recognition system!

Exercise 1 Solution

Let's work through the second exercise solution!

Exercise 2 Solution
About the Instructor
Data Weekends
4.5 Average rating
347 Reviews
4,235 Students
1 Course
Learn the essentials of Data Science in just one weekend

Data Weekends™ are accelerated data science workshop for programmers where you can quickly learn to apply predictive analytics to real-world data. We offer courses in Data Analytics, Machine Learning, Deep Learning and Reinforcement Learning.

Through our parent company Catalit LLC we also offer corporate training and consulting on Data Science, Machine Learning and Deep Learning.

Data Weekends' founder and lead instructor is Francesco Mosconi, PhD.

Jose Portilla
4.5 Average rating
54,035 Reviews
258,390 Students
13 Courses
Data Scientist

Jose Marcial Portilla has a BS and MS in Mechanical Engineering from Santa Clara University and years of experience as a professional instructor and trainer for Data Science and programming. He has publications and patents in various fields such as microfluidics, materials science, and data science technologies. Over the course of his career he has developed a skill set in analyzing data and he hopes to use his experience in teaching and data science to help other people learn the power of programming the ability to analyze data, as well as present the data in clear and beautiful visualizations. Currently he works as the Head of Data Science for Pierian Data Inc. and provides in-person data science and python programming training courses to employees working at top companies, including General Electric, Cigna, The New York Times, Credit Suisse, and many more. Feel free to contact him on LinkedIn for more information on in-person training sessions.

Francesco Mosconi
4.5 Average rating
347 Reviews
4,235 Students
1 Course

Francesco is a Data Science consultant and trainer. With Catalit LLC he helps companies acquire skills and knowledge in data science and harness the power of machine learning and deep learning to reach their goals

Before Data Weekends, Francesco served as lead instructor in Data Science at General Assembly and The Data Incubator and he was Chief Data Officer and co-­founder at Spire, a YCombinator-­backed startup company that invented the first consumer wearable device capable of continuously tracking respiration and activity.

He earned a joint PhD in biophysics at University of Padua and Université de Paris VI and is also a graduate of Singularity University summer program of 2011.