Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Modern Reinforcement Learning: Actor-Critic Agents

Name: Modern Reinforcement Learning: Actor-Critic Agents
Rating: 4.3 (531 reviews)

Implement Cutting Edge Artificial Intelligence Research Papers in the Open AI Gym Using the PyTorch & Tensorflow2

Created byPhil Tabor

Last updated 8/2023

English

What you'll learn

How to code policy gradient methods in PyTorch
How to code Deep Deterministic Policy Gradients (DDPG) in PyTorch
How to code Twin Delayed Deep Deterministic Policy Gradients (TD3) in PyTorch
How to code actor critic algorithms in PyTorch
How to implement cutting edge artificial intelligence research papers in Python

Course content

8 sections • 74 lectures • 10h 21m total length

What You Will Learn in this Course3:41
Required Background, Software, and Hardware3:17
How to Succeed in this Course3:51

What's so Great About Policy Gradient Methods?7:38
Combining Neural Networks with Monte Carlo: REINFORCE Policy Gradient Algorithm5:02
Introducing the Lunar Lander Environment3:54
Coding the Agent's Brain: The Policy Gradient Network5:29
Coding the Policy Gradient Agent's Basic Functionality5:50
Coding the Agent's Learn Function6:04
Coding the Policy Gradient Main Loop and Watching our Agent Land on the Moon9:27
Actor Critic Learning: Combining Policy Gradients & Temporal Difference Learning4:12
Coding the Actor Critic Networks3:23
Coding the Actor Critic Agent8:20
Coding the Actor Critic Main Loop and Watching Our Agent Land on the Moon9:22

Getting up to Speed With Deep Q Learning4:44
How to Read and Understand Cutting Edge Research Papers6:11
Analyzing the DDPG Paper Abstract and Introduction7:00
Analyzing the Background Material5:55
What Algorithm Are We Going to Implement?8:03
What Results Should We Expect?9:37
What Other Solutions are Out There?4:31
What Model Architecture and Hyperparameters Do We Need?3:12
Handling the Explore-Exploit Dilemma: Coding the OU Action Noise Class3:37
Giving our Agent a Memory: Coding the Replay Memory Buffer Class7:04
Deep Q Learning for Actor Critic Methods: Coding the Critic Network Class15:49
Coding the Actor Network Class10:10
Giving our DDPG Agent Simple Autonomy: Coding the Basic Functions of Our Agent12:11
Giving our DDPG Agent a Brain: Coding the Agent's Learn Function9:43
Coding the Network Parameter Update Functionality8:16
Coding the Main Loop and Watching Our DDPG Agent Land on the Moon13:11

Some Tips on Reading this Paper1:39
Analyzing the TD3 Paper Abstract and Introduction9:32
What Other Solutions Have People Tried?3:36
Reviewing the Fundamental Concepts2:53
Is Overestimation Bias Even a Problem in Actor-Critic Methods?13:16
Why is Variance a Problem for Actor-Critic Methods?6:56
What Results Can We Expect?6:06
Coding the Brains of the TD3 Agent - The Actor and Critic Network Classes13:34
Giving our TD3 Agent Simple Autonomy - Coding the Basic Agent Functionality10:57
Giving our TD3 Agent a Brain - Coding the Learn Function10:31
Coding the Network Parameter Update Functionality11:32
Coding the Main Loop And Watching our Agent Learn to Walk9:44

A Quick Word on the Paper1:00
Getting Acquainted With a New Framework5:45
Checking Out What Has Been Done Before4:44
Inspecting the Foundation of this New Framework3:37
Digging Into the Mathematics of Soft Actor Critic11:00
Seeing How the New Algorithm Measures Up7:50
Coding the Neural Networks23:25
Coding the Soft Actor Critic Basic Functionality10:59
Coding the Soft Actor Critic Algorithm12:34
Coding the Main Loop and Evaluating Our Agent12:34

Coding the Policy Gradient Network in Tensorflow 23:54
Coding the REINFORCE Agent in Tensorflow 210:46
Coding the REINFORCE Main Loop and Evaluating our Agent8:34
Coding the Actor Critic Network in Tensorflow 21:45
Coding the Actor Critic Agent in Tensorflow 25:50
Coding the Actor Critic Main Program and Evaluating our Agent3:40
Coding the DDPG Networks in Tensorflow 23:55
Coding the DDPG Agent in Tensorflow 215:47
Coding the DDPG Main Program and Evaluating our Agent4:17
Coding the TD3 Agent in Tensorflow 215:58
Coding the TD3 Main Program and Evaluating our Agent1:54
Coding the SAC Networks in Tensorflow 24:32
Coding the SAC Agent in Tensorflow 219:15
Coding the SAC Main Function and Evaluating our Agent2:44

Requirements

Understanding of college level calculus
Prior courses in reinforcement learning
Able to code deep neural networks independently

Description

In this advanced course on deep reinforcement learning, you will learn how to implement policy gradient, actor critic, deep deterministic policy gradient (DDPG), twin delayed deep deterministic policy gradient (TD3), and soft actor critic (SAC) algorithms in a variety of challenging environments from the Open AI gym. There will be a strong focus on dealing with environments with continuous action spaces, which is of particular interest for those looking to do research into robotic control with deep reinforcement learning.

Rather than being a course that spoon feeds the student, here you are going to learn to read deep reinforcement learning research papers on your own, and implement them from scratch. You will learn a repeatable framework for quickly implementing the algorithms in advanced research papers. Mastering the content in this course will be a quantum leap in your capabilities as an artificial intelligence engineer, and will put you in a league of your own among students who are reliant on others to break down complex ideas for them.

Fear not, if it's been a while since your last reinforcement learning course, we will begin with a briskly paced review of core topics.

The course begins with a practical review of the fundamentals of reinforcement learning, including topics such as:

The Bellman Equation
Markov Decision Processes
Monte Carlo Prediction
Monte Carlo Control
Temporal Difference Prediction TD(0)
Temporal Difference Control with Q Learning

And moves straight into coding up our first agent: a blackjack playing artificial intelligence. From there we will progress to teaching an agent to balance the cart pole using Q learning.

After mastering the fundamentals, the pace quickens, and we move straight into an introduction to policy gradient methods. We cover the REINFORCE algorithm, and use it to teach an artificial intelligence to land on the moon in the lunar lander environment from the Open AI gym. Next we progress to coding up the one step actor critic algorithm, to again beat the lunar lander.

With the fundamentals out of the way, we move on to our harder projects: implementing deep reinforcement learning research papers. We will start with Deep Deterministic Policy Gradients (DDPG), which is an algorithm for teaching robots to excel at a variety of continuous control tasks. DDPG combines many of the advances of Deep Q Learning with traditional actor critic methods to achieve state of the art results in environments with continuous action spaces.

Next, we implement a state of the art artificial intelligence algorithm: Twin Delayed Deep Deterministic Policy Gradients (TD3). This algorithm sets a new benchmark for performance in continuous robotic control tasks, and we will demonstrate world class performance in the Bipedal Walker environment from the Open AI gym. TD3 is based on the DDPG algorithm, but addresses a number of approximation issues that result in poor performance in DDPG and other actor critic algorithms.

Finally, we will implement the soft actor critic algorithm (SAC). SAC approaches deep reinforcement learning from a totally different angle: by considering entropy maximization, rather than score maximization, as a viable objective. This results in increased exploration by our agent, and world class performance in a number of important Open AI Gym environments.

By the end of the course, you will know the answers to the following fundamental questions in Actor-Critic methods:

Why should we bother with actor critic methods when deep Q learning is so successful?
Can the advances in deep Q learning be used in other fields of reinforcement learning?
How can we solve the explore-exploit dilemma with a deterministic policy?
How do we get and deal with overestimation bias in actor-critic methods?
How do we deal with the inherent approximation errors in deep neural networks?

This course is for the highly motivated and advanced student. To succeed, you must have prior course work in all the following topics:

College level calculus
Reinforcement learning
Deep learning

The pace of the course is brisk and the topics are at the cutting edge of deep reinforcement learning research, but the payoff is that you will come out knowing how to read research papers and turn them into functional code as quickly as possible. You'll never have to rely on dodgy medium blog posts again.

Who this course is for:

Advanced students of artificial intelligence who want to implement state of the art academic research papers

Modern Reinforcement Learning: Actor-Critic Agents

What you'll learn

Explore related topics

Course content

Introduction3 lectures • 11min

Fundamentals of Reinforcement Learning6 lectures • 1hr 28min

Landing on the Moon with Policy Gradients & Actor Critic Methods11 lectures • 1hr 9min

Deep Deterministic Policy Gradients (DDPG): Actor Critic with Continuous Actions16 lectures • 2hr 9min

Twin Delayed Deep Deterministic Policy Gradients (TD3)12 lectures • 1hr 40min

Soft Actor Critic10 lectures • 1hr 33min

Tensorflow 2 Implementation14 lectures • 1hr 43min

Appendix2 lectures • 29min

Requirements

Description

Who this course is for: