Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Advanced AI: Deep Reinforcement Learning in Python

Name: Advanced AI: Deep Reinforcement Learning in Python
Rating: 4.5 (6517 reviews)

The Complete Guide to Mastering Artificial Intelligence using Deep Learning and Neural Networks

Created byLazy Programmer Team, Lazy Programmer Inc.

Last updated 2/2026

English

English [Auto],Italian [Auto],

What you'll learn

Build various deep learning agents (including DQN and A3C)
Apply a variety of advanced reinforcement learning algorithms to any problem
Q-Learning with Deep Neural Networks
Policy Gradient Methods with Neural Networks
Reinforcement Learning with RBF Networks
Use Convolutional Neural Networks with Deep Q-Learning
Understand important foundations for OpenAI ChatGPT, GPT-4

Course content

13 sections • 80 lectures • 10h 39m total length

Introduction and Outline7:23
Explore deep reinforcement learning in Python with OpenAI Gym environments, from CartPole to Atari games, and learn to apply policy gradients and deep networks to reinforcement learning.
Where to get the Code12:03
Learn where to download the course code from the GitHub repository, using git clone or the download button, and follow best practices to update and navigate the files.
How to Succeed in this Course3:04
learn how to succeed in this course by using the q&a to ask questions, meeting prerequisites, and getting hands dirty with handwritten notes and coding exercises.
Tensorflow or Theano - Your Choice!4:09
Implement the concepts from the previous lecture in both Theano and TensorFlow, showing no new content, and follow a theory-to-code approach with parallel code examples.

Reinforcement Learning Section Introduction6:34
Explore the theory of reinforcement learning, contrasting it with supervised learning, emphasizing time, goals, and future planning. The lecture highlights data labeling challenges using self-driving cars and mazes.
Elements of a Reinforcement Learning Problem20:18
Define the agent and environment in reinforcement learning, and illustrate episodes, states, actions, rewards, and state and action spaces via tic tac toe, breakout, and maze examples.
States, Actions, Rewards, Policies9:24
Represent states as discrete indices or continuous vectors, encode actions accordingly, and implement policies as probabilistic mappings with epsilon-greedy exploration to maximize rewards.
Markov Decision Processes (MDPs)10:07
Define states, actions, and rewards within the agent–environment framework to form solvable Markov decision processes, and study the Markov assumption, state-transition probabilities, and Q-learning.
The Return4:56
Maximize the sum of future rewards to enable long-term planning, using gamma discounting and the recursive return definition to evaluate future sequences.
Value Functions and the Bellman Equation9:53
Derive the Bellman equation from expected values to define the value function V_pi(s) as the gamma-discounted expected return. When policy and environment dynamics are known, solve V using linear algebra.
What does it mean to “learn”?7:18
Define the control problem in reinforcement learning and contrast it with the prediction problem. Relate state value V(s) and action value Q(s, a) to the optimal policy and its evaluation.
Solving the Bellman Equation with Reinforcement Learning (pt 1)9:49
Explore real reinforcement learning algorithms using Monte Carlo sampling to estimate state values under a policy, discuss prediction versus control problems, and derive returns from episodes.
Solving the Bellman Equation with Reinforcement Learning (pt 2)12:04
Explore policy evaluation and improvement in reinforcement learning, using Monte Carlo methods to update state and action values (V and Q) through generalized policy iteration and exponentially decaying averages.
Epsilon-Greedy6:09
Implement epsilon-greedy to solve the explore-exploit dilemma in reinforcement learning by balancing exploration with greedy action selection based on Q-values.
Q-Learning14:15
Explore the Q-learning algorithm and its temporal difference updates, contrasting Monte Carlo methods with bootstrapped, off-policy learning using an epsilon-greedy policy to update a Q-table in reinforcement learning.
How to Learn Reinforcement Learning5:56
Learn why reinforcement learning differs from supervised learning, why you must implement algorithms yourself, and move from tabular methods to deep learning through dynamic programming, Monte Carlo, and temporal difference.
Suggestion Box3:10
Submit feedback through a simple online suggestion box to help tailor this deep reinforcement learning course; share background, course difficulty, missing explanations, and future topic requests.

OpenAI Gym Tutorial5:43
Learn OpenAI Gym basics by connecting to an environment, resetting to the start state, and running episodes with random actions to inspect observations, actions, rewards, and done signals.
Random Search5:48
Explore random search for linear policy optimization in a reinforcement learning task, evaluating 100 random weight vectors by average episode length to select the best performing parameters.
Saving a Video2:18
Learn how to save a video of an agent playing an episode by wrapping the environment with a monitor and specifying a save directory, then view the rendered video.
CartPole with Bins (Theory)3:51
Explore how discretizing the CartPole state into finite bins enables tabular Q-learning, including handling states outside the box, choosing bin sizes, and shaping rewards to guide learning.
CartPole with Bins (Code)6:25
Apply q-learning with discrete, quantized CartPole states to build a q-table, including a feature transformer, state binning, epsilon-greedy action selection, and running-average performance over episodes.
RBF Neural Networks10:26
Examine radial basis function networks for function approximation in deep reinforcement learning, using Gaussian kernels with exemplars and a linear model or a single hidden layer.
RBF Networks with Mountain Car (Code)5:28
Implement q-learning with an rbf network to solve the mountain car problem, using a feature transformer and multiple regressors. Apply epsilon-greedy exploration, partial_fit, and optimistic initial values for stable learning.
RBF Networks with CartPole (Theory)1:54
Modify the RBF network to work on cartpole instead of mountain car. Implement partial_fit and predict to mirror a gradient-descent model and design plausible state ranges for better RBF exemplars.
RBF Networks with CartPole (Code)3:11
Implement q-learning with an rbf network to solve CartPole, using a feature transformer, standard scaler, epsilon-greedy training, and reward tracking with running averages.
Theano Warmup3:04
This lecture re-implements a simple neural network training loop in Theano, highlighting graph inputs, shared parameters, cost function, updates, and training and prediction routines.
Tensorflow Warmup2:25
Practice a TensorFlow warmup for deep reinforcement learning by building a simple model with variables, placeholders, a squared-error cost, and an interactive session to train and predict.
Plugging in a Neural Network3:39
Plug a neural network into the existing deep reinforcement learning script, explore non-linear models, discuss transfer learning, catastrophic forgetting, and dropout regularization with various architectures.
OpenAI Gym Section Summary3:28
Explore OpenAI Gym environments and build reinforcement learning agents in Python, from random search to Q-learning with BF networks and linear function approximators, across mountain car and karpel tasks.

N-Step Methods3:14
Apply n-step td lambda methods to balance one-step learning and full returns, using n-step returns to update v and q for prediction and control with greedy or epsilon policies.
N-Step in Code3:40
Implement the step method for mountain car using an n-step Q-learning update, tracking recent states and rewards, applying a gamma-based multiplier, and handling gym termination at 200 steps.
TD Lambda7:36
Explore td lambda as a generalization of the end step method and the lambda return, using eligibility trace to blend step and Monte Carlo and control the tradeoff via lambda.
TD Lambda in Code3:00
Learn to implement TD lambda in code for deep reinforcement learning, using eligibility traces, gamma and lambda updates, and a streamlined update path with a class-based model and gradient features.
TD Lambda Summary2:21
Learn how td lambda generalizes one-step returns to end-step and Monte Carlo methods with eligibility traces, producing lambda returns. Relate this online update to momentum-like ideas from deep learning.

Policy Gradient Methods11:38
Explore policy gradient methods in deep reinforcement learning by parameterizing the policy with softmax, optimizing the policy objective via gradient ascent, and using baselines and actor-critic updates.
Policy Gradient in TensorFlow for CartPole7:19
implement a policy gradient solution in TensorFlow for cartpole using a linear policy and a neural network value function; test architectures, calculate returns and advantages, and explore optimizer options.
Policy Gradient in Theano for CartPole4:14
Explore policy gradient methods in python using Theano to solve CartPole, reusing neural network architectures with advanced optimizers, and implementing policy and value models with gradient updates.
Continuous Action Spaces4:16
Extend policy gradient to continuous action spaces using a Gaussian with mean and variance as linear functions of features, with positive variance via exponential or softplus, and standard updates.
Mountain Car Continuous Specifics4:12
This lecture applies a parameterized policy to the continuous mountain car, explains the reward structure, and uses hill climbing to optimize parameters, with neural networks or radial basis function models.
Mountain Car Continuous Theano7:31
Implement a parameterized policy for the continuous mountain car using Theano, detailing mean and variance models, Gaussian sampling, and hill climbing with perturbations and random search to maximize average reward.
Mountain Car Continuous Tensorflow8:07
Implement a parameterized policy for the continuous mountain car using hill climbing and random search, with a gaussian mean variance policy, TensorFlow utilities, and action clipping between -1 and 1.
Mountain Car Continuous Tensorflow (v2)6:11
Apply policy gradient with gradient ascent to train a continuous mountain car using a gaussian action distribution in TensorFlow v2, incorporating value-based advantages and entropy regularization.
Mountain Car Continuous Theano (v2)7:31
Apply policy gradient methods to a continuous Mountain Car using a Gaussian policy, and train with gradient descent and entropy regularization to learn both policy and value models.
Policy Gradient Section Summary1:36
Learn policy gradient methods in deep reinforcement learning by modeling a probabilistic policy with learnable parameters and, for continuous action spaces, using a Gaussian with mean and variance.

Deep Q-Learning Intro3:52
Explore deep reinforcement learning with deep q-learning for complex games like Atari, highlighting how large state spaces, long training times, and costs shape practical experiments, and favoring CPU-based testing.
Deep Q-Learning Techniques9:13
Explore deep Q-learning with a deep Q-network, experience replay, and a target network, using grayscale four-frame inputs through convolutional neural networks to stabilize training.
Deep Q-Learning in Tensorflow for CartPole5:09
Implement deep Q-learning in TensorFlow for CartPole, building a DQN with a target network, experience replay, and parameter copying to observe learning in action.
Deep Q-Learning in Theano for CartPole4:48
Implement deep Q-learning in Theano for CartPole, using experience replay, a target network, and gamma-based Q targets. Tune hyperparameters and experiment with optimizers and batch sampling.
Additional Implementation Details for Atari5:36
Explore practical Atari deep reinforcement learning details for Breakout. Crop and greyscale frames, downsample with nearest-neighbor interpolation, and use TensorFlow built-in layers and scopes for efficient networks and epsilon scheduling.
Pseudocode and Replay Memory6:15
Learn a practical pseudocode for deep q-learning on Atari and implement an efficient experience replay memory with main and target networks, epsilon-greedy action selection, and regular weight updates.
Deep Q-Learning in Tensorflow for Breakout23:47
Explore deep Q-learning in TensorFlow applied to Breakout, detailing replay memory, target network updates, image preprocessing (grayscale, crop, resize), state construction from four frames, and epsilon-greedy training.
Deep Q-Learning in Theano for Breakout23:54
Explore deep Q-learning in Theano for Breakout. Implement replay memory, grayscale downsampled 84x84 frames, four-frame state construction, convolutional layers, target network updates, and epsilon-greedy training to learn game policies.
Partially Observable MDPs4:52
Explore partially observable MDPs and learn how integrating sequences of observations with recurrent neural networks helps infer the true state and guide learning from time-based data.
Deep Q-Learning Section Summary4:45
Recaps deep Q-learning concepts, highlighting experience replay, target networks, and the RPF network, and shows CNN-based feature learning for Breakout, Pong, Space Invaders, Seaquest, and Beam Rider.

A3C - Theory and Outline16:30
Explore asynchronous advantage actor-critic (a3c) with multiple parallel workers updating global policy and value networks, using n-step returns and entropy regularization to stabilize learning.
A3C - Code pt 1 (Warmup)6:28
Explore Python threading with a simple A3C-style example where multiple workers increment a shared global counter to 20. Learn how thread-safe counters, worker IDs, and join synchronize completion.
A3C - Code pt 26:27
Explore implementing A3C in Python by wiring value and policy networks, multiple workers, and threading with a tf coordinator, for Breakout with 5 million steps and smoothed return plots.
A3C - Code pt 37:35
Explore building a shared feature extractor for a policy and value network in A3C, using a lean convolutional neural network, scaling inputs by 255, and handling four grayscale frames.
A3C - Code pt 418:02
Explain the a3c worker, where each thread runs local networks, copies global weights, and updates them with local gradients, plus image preprocessing and four-frame state construction.
A3C - Section Summary2:05
Learn how A3C achieves training stability through parallel workers and n-step returns, contrasting with DQN's replay buffer and target networks, and highlighting hyperparameter search challenges.
Course Summary4:57
Summarize how deep learning integrates with reinforcement learning, covering MDPs, dynamic programming, Monte Carlo, and TD methods, policy gradient, deep q-learning with experience replay and target networks.

(Review) Theano Basics7:47
Explore Theano basics by defining scalar, vector, and tensor variables, creating shared variables, and building symbolic functions to compute gradients and train models.
(Review) Theano Neural Network in Code9:17
Implement a Theano neural network in Python with a softmax output, a cost function and regularisation, and train and predict using data from previous examples.
(Review) Tensorflow Basics7:27
Master TensorFlow basics by defining variables and placeholders, running sessions, performing matrix multiplication, and applying a gradient descent optimizer to minimize a simple cost function.
(Review) Tensorflow Neural Network in Code9:43
Build a TensorFlow neural network by adding a second hidden layer, configure placeholders and variables, apply softmax cross-entropy loss, train with an optimizer, and analyze error rates for overfitting.

Pre-Installation Check4:12
This pre-installation check clarifies that installation lectures are guidelines and emphasizes learning Python principles over syntax. It covers using pip to install libraries and choosing course-relevant tools like OpenAI Gym.
Anaconda Environment Setup20:20
Install and manage data science libraries on Windows with Anaconda, isolating environments and setting up Python, NumPy stack, and frameworks like TensorFlow, PyTorch, Keras, and open gym.
How to install Numpy, Scipy, Matplotlib, Pandas, IPython, Theano, and TensorFlow17:32
Learn to set up a cross-platform deep learning environment across Windows, Linux, and Mac, and install NumPy, SciPy, Matplotlib, Pandas, IPython, Theano, and TensorFlow using virtual machines, pip, and conda.

Requirements

Know reinforcement learning basics, MDPs, Dynamic Programming, Monte Carlo, TD Learning
College-level math is helpful
Experience building machine learning models in Python and Numpy
Know how to build ANNs and CNNs using Theano or Tensorflow

Description

Ever wondered how AI technologies like OpenAI ChatGPT and GPT-4 really work? In this course, you will learn the foundations of these groundbreaking applications.

This course is all about the application of deep learning and neural networks to reinforcement learning.

If you’ve taken my first reinforcement learning class, then you know that reinforcement learning is on the bleeding edge of what we can do with AI.

Specifically, the combination of deep learning with reinforcement learning has led to AlphaGo beating a world champion in the strategy game Go, it has led to self-driving cars, and it has led to machines that can play video games at a superhuman level.

Reinforcement learning has been around since the 70s but none of this has been possible until now.

The world is changing at a very fast pace. The state of California is changing their regulations so that self-driving car companies can test their cars without a human in the car to supervise.

We’ve seen that reinforcement learning is an entirely different kind of machine learning than supervised and unsupervised learning.

Supervised and unsupervised machine learning algorithms are for analyzing and making predictions about data, whereas reinforcement learning is about training an agent to interact with an environment and maximize its reward.

Unlike supervised and unsupervised learning algorithms, reinforcement learning agents have an impetus - they want to reach a goal.

This is such a fascinating perspective, it can even make supervised / unsupervised machine learning and "data science" seem boring in hindsight. Why train a neural network to learn about the data in a database, when you can train a neural network to interact with the real-world?

While deep reinforcement learning and AI has a lot of potential, it also carries with it huge risk.

Bill Gates and Elon Musk have made public statements about some of the risks that AI poses to economic stability and even our existence.

As we learned in my first reinforcement learning course, one of the main principles of training reinforcement learning agents is that there are unintended consequences when training an AI.

AIs don’t think like humans, and so they come up with novel and non-intuitive solutions to reach their goals, often in ways that surprise domain experts - humans who are the best at what they do.

OpenAI is a non-profit founded by Elon Musk, Sam Altman (Y Combinator), and others, in order to ensure that AI progresses in a way that is beneficial, rather than harmful.

Part of the motivation behind OpenAI is the existential risk that AI poses to humans. They believe that open collaboration is one of the keys to mitigating that risk.

One of the great things about OpenAI is that they have a platform called the OpenAI Gym, which we’ll be making heavy use of in this course.

It allows anyone, anywhere in the world, to train their reinforcement learning agents in standard environments.

In this course, we’ll build upon what we did in the last course by working with more complex environments, specifically, those provided by the OpenAI Gym:

CartPole
Mountain Car
Atari games

To train effective learning agents, we’ll need new techniques.

We’ll extend our knowledge of temporal difference learning by looking at the TD Lambda algorithm, we’ll look at a special type of neural network called the RBF network, we’ll look at the policy gradient method, and we’ll end the course by looking at Deep Q-Learning (DQN) and A3C (Asynchronous Advantage Actor-Critic).

Thanks for reading, and I’ll see you in class!

"If you can't implement it, you don't understand it"

Or as the great physicist Richard Feynman said: "What I cannot create, I do not understand".
My courses are the ONLY courses where you will learn how to implement machine learning algorithms from scratch
Other courses will teach you how to plug in your data into a library, but do you really need help with 3 lines of code?
After doing the same thing with 10 datasets, you realize you didn't learn 10 things. You learned 1 thing, and just repeated the same 3 lines of code 10 times...

Suggested Prerequisites:

College-level math is helpful (calculus, probability)
Object-oriented programming
Python coding: if/else, loops, lists, dicts, sets
Numpy coding: matrix and vector operations
Linear regression
Gradient descent
Know how to build ANNs and CNNs in Theano or TensorFlow
Markov Decision Proccesses (MDPs)
Know how to implement Dynamic Programming, Monte Carlo, and Temporal Difference Learning to solve MDPs

WHAT ORDER SHOULD I TAKE YOUR COURSES IN?:

Check out the lecture "Machine Learning and AI Prerequisite Roadmap" (available in the FAQ of any of my courses, including the free Numpy course)

UNIQUE FEATURES

Every line of code explained in detail - email me any time if you disagree
No wasted time "typing" on the keyboard like other courses - let's be honest, nobody can really write code worth learning about in just 20 minutes from scratch
Not afraid of university-level math - get important details about algorithms that other courses leave out

Who this course is for:

Professionals and students with strong technical backgrounds who wish to learn state-of-the-art AI techniques

Advanced AI: Deep Reinforcement Learning in Python

What you'll learn

Explore related topics

Course content

Introduction and Logistics4 lectures • 27min

The Basics of Reinforcement Learning13 lectures • 2hr

OpenAI Gym and Basic Reinforcement Learning Techniques13 lectures • 58min

TD Lambda5 lectures • 20min

Policy Gradients10 lectures • 1hr 3min

Deep Q-Learning10 lectures • 1hr 32min

A3C7 lectures • 1hr 2min

Theano and Tensorflow Basics Review4 lectures • 34min

Appendix / FAQ Intro1 lecture • 4min

Setting Up Your Environment (FAQ by Student Request)3 lectures • 42min

Requirements

Description

Who this course is for: