Modern Reinforcement Learning: Actor-Critic Methods
4.3 (31 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
256 students enrolled

Modern Reinforcement Learning: Actor-Critic Methods

How to Implement Cutting Edge Artificial Intelligence Research Papers in the Open AI Gym Using the PyTorch Framework
4.3 (31 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
256 students enrolled
Created by Phil Tabor
Last updated 7/2020
English
English [Auto]
Current price: $139.99 Original price: $199.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 6.5 hours on-demand video
  • 2 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • How to code policy gradient methods in PyTorch
  • How to code Deep Deterministic Policy Gradients (DDPG) in PyTorch
  • How to code Twin Delayed Deep Deterministic Policy Gradients (TD3) in PyTorch
  • How to code actor critic algorithms in PyTorch
  • How to implement cutting edge artificial intelligence research papers in Python
Course content
Expand all 48 lectures 06:37:01
+ Fundamentals of Reinforcement Learning
6 lectures 01:28:01

This is a simple two state system with two actions. You have to use this information to calculate the state transition probabilities. Please see the handout from lecture 4 for related material.

Calculating State Transition Probabilities
1 question
Teaching an AI about Black Jack with Monte Carlo Prediction
20:00
Teaching an AI How to Play Black Jack with Monte Carlo Control
19:41
Review of Temporal Difference Learning Methods
03:50
Teaching an AI about Balance with TD(0) Prediction
09:42
+ Landing on the Moon with Policy Gradients & Actor Critic Methods
11 lectures 01:08:41
What's so Great About Policy Gradient Methods?
07:38
Combining Neural Networks with Monte Carlo: REINFORCE Policy Gradient Algorithm
05:02
Introducing the Lunar Lander Environment
03:54
Coding the Agent's Brain: The Policy Gradient Network
05:29
Coding the Policy Gradient Agent's Basic Functionality
05:50
Coding the Agent's Learn Function
06:04
Coding the Policy Gradient Main Loop and Watching our Agent Land on the Moon
09:27
Actor Critic Learning: Combining Policy Gradients & Temporal Difference Learning
04:12
Coding the Actor Critic Networks
03:23
Coding the Actor Critic Agent
08:20
Coding the Actor Critic Main Loop and Watching Our Agent Land on the Moon
09:22
+ Deep Deterministic Policy Gradients (DDPG): Actor Critic with Continuous Actions
16 lectures 02:09:14
Getting up to Speed With Deep Q Learning
04:44
How to Read and Understand Cutting Edge Research Papers
06:11
Analyzing the DDPG Paper Abstract and Introduction
07:00
Analyzing the Background Material
05:55
What Algorithm Are We Going to Implement?
08:03
What Results Should We Expect?
09:37
What Other Solutions are Out There?
04:31
What Model Architecture and Hyperparameters Do We Need?
03:12
Handling the Explore-Exploit Dilemma: Coding the OU Action Noise Class
03:37
Giving our Agent a Memory: Coding the Replay Memory Buffer Class
07:04
Deep Q Learning for Actor Critic Methods: Coding the Critic Network Class
15:49
Coding the Actor Network Class
10:10
Giving our DDPG Agent Simple Autonomy: Coding the Basic Functions of Our Agent
12:11
Giving our DDPG Agent a Brain: Coding the Agent's Learn Function
09:43
Coding the Network Parameter Update Functionality
08:16
Coding the Main Loop and Watching Our DDPG Agent Land on the Moon
13:11
+ Twin Delayed Deep Deterministic Policy Gradients (TD3)
12 lectures 01:40:16
Some Tips on Reading this Paper
01:39
Analyzing the TD3 Paper Abstract and Introduction
09:32
What Other Solutions Have People Tried?
03:36
Reviewing the Fundamental Concepts
02:53
Is Overestimation Bias Even a Problem in Actor-Critic Methods?
13:16
Why is Variance a Problem for Actor-Critic Methods?
06:56
What Results Can We Expect?
06:06
Coding the Brains of the TD3 Agent - The Actor and Critic Network Classes
13:34
Giving our TD3 Agent Simple Autonomy - Coding the Basic Agent Functionality
10:57
Giving our TD3 Agent a Brain - Coding the Learn Function
10:31
Coding the Network Parameter Update Functionality
11:32
Coding the Main Loop And Watching our Agent Learn to Walk
09:44
Requirements
  • Understanding of college level calculus
  • Prior courses in reinforcement learning
  • Able to code deep neural networks independently
Description

In this advanced course on deep reinforcement learning, you will learn how to implement policy gradient, actor critic, deep deterministic policy gradient (DDPG), and twin delayed deep deterministic policy gradient (TD3) algorithms in a variety of challenging environments from the Open AI gym.

The course begins with a practical review of the fundamentals of reinforcement learning, including topics such as:

  • The Bellman Equation

  • Markov Decision Processes

  • Monte Carlo Prediction

  • Monte Carlo Control

  • Temporal Difference Prediction TD(0)

  • Temporal Difference Control with Q Learning

And moves straight into coding up our first agent: a blackjack playing artificial intelligence. From there we will progress to teaching an agent to balance the cart pole using Q learning.

After mastering the fundamentals, the pace quickens, and we move straight into an introduction to policy gradient methods. We cover the REINFORCE algorithm, and use it to teach an artificial intelligence to land on the moon in the lunar lander environment from the Open AI gym. Next we progress to coding up the one step actor critic algorithm, to again beat the lunar lander.

With the fundamentals out of the way, we move on to our harder projects: implementing deep reinforcement learning research papers. We will start with Deep Deterministic Policy Gradients, which is an algorithm for teaching robots to excel at a variety of continuous control tasks.

Finally, we implement a state of the art artificial intelligence algorithm: Twin Delayed Deep Deterministic Policy Gradients. This algorithm sets a new benchmark for performance in robotic control tasks, and we will demonstrate world class performance in the Bipedal Walker environment from the Open AI gym.

By the end of the course, you will know the answers to the following fundamental questions in Actor-Critic methods:

  • Why should we bother with actor critic methods when deep Q learning is so successful?

  • Can the advances in deep Q learning be used in other fields of reinforcement learning?

  • How can we solve the explore-exploit dilemma with a deterministic policy?

  • How do we get overestimation bias in actor-critic methods?

  • How do we deal with the inherent errors in deep neural networks?

This course is for the highly motivated and advanced student. To succeed, you must have prior course work in all the following topics:

  • College level calculus

  • Reinforcement learning

  • Deep learning

The pace of the course is brisk, but the payoff is that you will come out knowing how to read cutting edge research papers and turn them into functional code as quickly as possible.

Who this course is for:
  • Advanced students of artificial intelligence who want to implement state of the art academic research papers