Modern Reinforcement Learning: Deep Q Agents (PyTorch & TF2)

Name: Modern Reinforcement Learning: Deep Q Agents (PyTorch & TF2)
Rating: 4.3 (1114 reviews)

How to Turn Deep Reinforcement Learning Research Papers Into Agents That Beat Classic Atari Games

Created byPhil Tabor

Last updated 8/2023

English

What you'll learn

How to read and implement deep reinforcement learning papers
How to code Deep Q learning agents
How to Code Double Deep Q Learning Agents
How to Code Dueling Deep Q and Dueling Double Deep Q Learning Agents
How to write modular and extensible deep reinforcement learning software
How to automate hyperparameter tuning with command line arguments

Course content

11 sections • 50 lectures • 6h 44m total length

What You Will Learn In This Course4:07
Required Background, software, and hardware3:46
How to Succeed in this Course4:45

Agents, Environments, and Actions10:04
Markov Decision Processes11:30
Explore how reinforcement learning uses an agent interacting with an environment to maximize discounted rewards, formalized as a Markov decision process with states, actions, rewards, and a policy.
Value Functions, Action Value Functions, and the Bellman Equation8:10
Model Free vs. Model Based Learning3:34
Compare model-based and model-free learning in reinforcement learning by examining the Bellman equation, the value function, and how dynamic programming vs. q-learning handle state transitions.
The Explore-Exploit Dilemma5:27
Temporal Difference Learning22:01

Dealing with Continuous State Spaces with Deep Neural Networks18:53
Explore how deep neural networks overcome the limitations of tabular q-learning in continuous state spaces by approximating the action-value function with a deep q-learning approach and PyTorch.
Naive Deep Q Learning in Code: Step 1 - Coding the Deep Q Network7:55
Naive Deep Q Learning in Code: Step 2 - Coding the Agent Class10:10
Naive Deep Q Learning in Code: Step 3 - Coding the Main Loop and Learning9:21
Naive Deep Q Learning in Code: Step 4 - Verifying the Functionality of Our Code2:13
Naive Deep Q Learning in Code: Step 5 - Analyzing Our Agent's Performance2:42
Dealing with Screen Images with Convolutional Neural Networks3:52

How to Read Deep Learning Papers7:15
Analyzing the Paper20:33
How to Modify the OpenAI Gym Atari Environments14:29
Learn to preprocess and stack OpenAI Gym Atari frames for deep Q-learning: grayscale and 84x84 resizing, max of two frames, and four-frame stacking with action repetition.
How to Preprocess the OpenAI Gym Atari Screen Images2:55
Preprocess OpenAI Gym Atari screen observations by converting to grayscale, resizing with cv2, reshaping and scaling pixels, and returning a new observation wrapper in PyTorch to prepare for stacked frames.
How to Stack the Preprocessed Atari Screen Images3:26
How to Combine All the Changes1:30
Learn to build a gym environment by composing make, repeat action, max frame, preprocess frame, and stack frames, shaping inputs to 84x84x1 with four-frame repeats and plan memory for agent.
How to Add Reward Clipping, Fire First, and No Ops4:49
Add reward clipping, fire first, and no ops to the environment by passing booleans, clipping rewards in step, and performing no ops then a fire action in reset.
How to Code the Agent's Memory10:55
How to Code the Deep Q Network11:44
Coding the Deep Q Agent: Step 1 - Coding the Constructor7:48
Code the dqn agent constructor by wiring the online and target networks, replay memory, epsilon-greedy action selection, weight copying from online to target, and model checkpointing with descriptive network naming.
Coding the Deep Q Agent: Step 2 - Epsilon-Greedy Action Selection2:22
Coding the Deep Q Agent: Step 3 - Memory, Model Saving and Network Copying4:24
Coding the Deep Q Agent: Step 4 - The Agent's Learn Function7:54
Coding the Deep Q Agent: Step 5 - The Main Loop and Analyzing the Performance14:14

Analyzing the Paper14:01
Explore the dueling network architecture for model-free reinforcement learning, with two streams for state value and action advantage sharing a convolutional backbone, boosting policy evaluation and Atari 2600 performance.
Coding the Dueling Deep Q Network3:21
Coding the Dueling Deep Q Learning Agent and Analyzing Performance10:10
Coding the Dueling Double Deep Q Learning Agent and Analyzing Performance5:36
Implement a dueling double deep q-learning agent by combining value and advantage streams and applying the double q-learning update in the learn function, then evaluate on Pong.

Differences Between Tensorflow 2 and PyTorch4:11
Compare tensorflow 2 and pytorch for deep q agents, covering keras models, call function, channels last vs first, model compile, save/load, and gradient tape training.
Coding the Deep Q Network Class in Tensorflow 26:27
Coding the Deep Q Learning Agent in Tensorflow 211:08
Testing the Tensorflow 2 Deep Q Learning Agent5:54
Coding the Tensorflow 2 Double Q Learning Agent2:35
Coding the Dueling Network and Agent in Tensorflow 26:32
Coding the Dueling Double DQN Agent in Tensorflow 24:31

Requirements

Some College Calculus
Exposure To Deep Learning
Comfortable with Python

Description

In this complete deep reinforcement learning course you will learn a repeatable framework for reading and implementing deep reinforcement learning research papers. You will read the original papers that introduced the Deep Q learning, Double Deep Q learning, and Dueling Deep Q learning algorithms. You will then learn how to implement these in pythonic and concise PyTorch and Tensorflow 2 code, that can be extended to include any future deep Q learning algorithms. These algorithms will be used to solve a variety of environments from the Open AI gym's Atari library, including Pong, Breakout, and Bankheist.

You will learn the key to making these Deep Q Learning algorithms work, which is how to modify the Open AI Gym's Atari library to meet the specifications of the original Deep Q Learning papers. You will learn how to:

Repeat actions to reduce computational overhead
Rescale the Atari screen images to increase efficiency
Stack frames to give the Deep Q agent a sense of motion
Evaluate the Deep Q agent's performance with random no-ops to deal with model over training
Clip rewards to enable the Deep Q learning agent to generalize across Atari games with different score scales

If you do not have prior experience in reinforcement or deep reinforcement learning, that's no problem. Included in the course is a complete and concise course on the fundamentals of reinforcement learning. The introductory course in reinforcement learning will be taught in the context of solving the Frozen Lake environment from the Open AI Gym.

We will cover:

Markov decision processes
Temporal difference learning
The original Q learning algorithm
How to solve the Bellman equation
Value functions and action value functions
Model free vs. model based reinforcement learning
Solutions to the explore-exploit dilemma, including optimistic initial values and epsilon-greedy action selection

Also included is a mini course in deep learning using the PyTorch framework. This is geared for students who are familiar with the basic concepts of deep learning, but not the specifics, or those who are comfortable with deep learning in another framework, such as Tensorflow or Keras. You will learn how to code a deep neural network in Pytorch as well as how convolutional neural networks function. This will be put to use in implementing a naive Deep Q learning agent to solve the Cartpole problem from the Open AI gym.

Who this course is for:

Python developers eager to learn about cutting edge deep reinforcement learning

Modern Reinforcement Learning: Deep Q Agents (PyTorch & TF2)

What you'll learn

Explore related topics

Course content

Introduction3 lectures • 13min

Fundamentals of Reinforcement Learning6 lectures • 1hr 1min

Deep Learning Crash Course7 lectures • 55min

Human Level Control Through Deep Reinforcement Learning: From Paper to Code14 lectures • 1hr 54min

Deep Reinforcement Learning with Double Q Learning2 lectures • 25min

Dueling Network Architectures for Deep Reinforcement Learning4 lectures • 33min

Improving On Our Solutions3 lectures • 36min

Conclusion1 lecture • 5min

Bonus Lecture1 lecture • 1min

Tensorflow 2 Implementations7 lectures • 41min

Requirements

Description

Who this course is for: