Reinforcement Learning (English): Master the Art of RL

Name: Reinforcement Learning (English): Master the Art of RL
Rating: 4.5 (23 reviews)

Reinforcement Learning

Created byCoursat.ai Dr. Ahmad ElSallab

Last updated 5/2023

English

What you'll learn

Define what is Reinforcement Learning?
Apply all what is learned using state-of-the art libraries like OpenAI Gym, StabeBaselines, Keras-RL and TensorFlow Agents
Define what are the applications domains and success stories of RL?
Define what are the difference between Reinforcement and Supervised Learning?
Define the main components of an RL problem setup?
Define what are the main ingredients of an RL agent and their taxonomy?
Define what is Markov Reward Process (MRP) and Markov Decision Process (MDP)?
Define the solution space of RL using MDP framework
Solve the RL problems using planning with Dynamic Programming algorithms, like Policy Evaluation, Policy Iteration and Value Iteration
Solve RL problems using model free algorithms like Monte-Carlo, TD learning, Q-learning and SARSA
Differentiate On-policy and Off-policy algorithms
Master Deep Reinforcement Learning algorithms like Deep Q-Networks (DQN), and apply them to Large Scale RL
Master Policy Gradients algorithms and Actor-Critic (AC, A2C, A3C)
Master advanced DRL algorithms like DDPG, TRPO and PPO
Define what is model-based RL, and differentiate it from planning, and what are their main algorithms and applications?

Course content

9 sections • 66 lectures • 9h 12m total length

Course introduction1:17
Course overview4:51

Module intro and roadmap3:03
Explore the fundamentals of reinforcement learning, contrast it with supervised learning, and introduce the action, reward, environment, and agent framework, including states, gym environment, and the policy, value, and model.
What is RL?7:29
What RL can do?22:36
The RL problem setup (AREA)6:51
Reward8:03
RL vs. Supervised Learning10:46
Contrast reinforcement learning with supervised learning by highlighting online learning signals and delayed, sparse rewards. Show how RL uses dynamic, action-dependent data and credit assignment, unlike static IID data.
State34:21
AREA examples and quizes25:19
Gym Environments20:12
Inside RL agent - RL agent ingredients2:44
Policy3:39
Value7:12
Model6:53
Explore the environment model, including the state transition model, and reward model, and how stochastic dynamics and expectation shape the value function and cumulative reward.
RL agents taxonomy15:39
Prediction vs Control3:36

Module intro and roadmap7:32
Planning with Dynamic Programming (DP)27:31
Explore planning with dynamic programming in reinforcement learning, from exhaustive tree search and look-ahead planning to caching solutions and Bellman equations.
Prediction with DP - Policy Evaluation5:32
Control with DP - Policy Iteration and Value Iteration8:18
Value Iteration example6:55
Prediction with Monte-Carlo - MC Policy Evaluation10:50
Monte Carlo sampling estimates value functions in model-free prediction by averaging returns from trajectories, using first-visit or every-visit counts across states. It overcomes the need for an explicit environment model.
Prediction with Temporal-Difference (TD)20:09
TD Lambda4:20
Explore how TD lambda unifies one-step td and monte carlo by weighting n-step returns with lambda, enabling a spectrum between td zero and full monte carlo.
Control with Monte-Carlo - MC Policy Iteration10:14
Control with TD - SARSA4:56
Apply td model-free methods to control using one-step returns and q-value updates. Combine with epsilon-greedy policy, sarsa, and policy iteration, referencing q(s,a) and td targets.
On-policy vs. Off-policy2:57
Explore on-policy versus off-policy learning in model-free reinforcement learning, where a behavior policy acts while a separate target policy learns, enabling learning from human actions and safer exploration.
Q-learning6:00
Q-learning is the off-policy variant of Sirsa, using an epsilon-greedy behavior policy and a greedy target policy derived from the q-function, updating by max over actions.
MDP solutions summary6:33

Module intro and roadmap1:24
Large Scale Reinforcement Learning10:23
DNN as function approximator14:50
Transform high-dimensional unstructured inputs into low-dimensional feature embeddings with deep neural networks, and use an encoder-decoder design and gradient-based optimization to close the semantic gap and produce accurate outputs.
Value Function Approximation6:12
DNN policies6:59
Value function approximation with DL encoder-decoder pattern13:43
Train a value function approximation with a neural network, using gradient descent and mean squared error, from a simple linear Q function to td and Monte Carlo targets for prediction.
Deep Q-Networks (DQN)5:18
DQN Atari Example with Keras-RL and TF-Agents7:31
Explore deep Q networks (DQN) on Atari games with high dimensional color frames and discrete actions. Use Keras-RL and TF-Agents with experience replay and fixed Q targets.

Module intro and roadmap1:19
Value-based vs Policy based vs Actor-Critic2:15
Policy Gradients (PG)10:23
REINFORCE - Monte-Carlo PG3:45
AC - Actor-Critic7:19
Explore actor-critic methods in deep reinforcement learning, combining policy gradients with a critic that estimates q values via td learning, updating theta and w parameters for improved rewards.
A2C - Advantage Actor-Critic5:53
Explore the A2C algorithm, introducing an advantage term based on the state value function baseline and apply temporal difference methods to update policy gradients.
A3C - Asynchronous Advantage Actor-Critic1:53
TRPO - Trusted Region Policy Optimization1:53
Explore trusted region policy optimization, which uses kl divergence to constrain policy updates and stabilize policy gradient methods, including actor-critic algorithms, with proximal policy approximation next.
PPO - Proximal Policy Optimization2:20
DDPG - Deep Determinstic Policy Gradients8:04
StableBaselines library overview11:06
Atari example with stable-baselines1:26
Mario example with stable-baselines3:22
StreetFighter example with stable-baselines5:36

Requirements

Machine Learning basics
Deep Learning basics
Probability
Programming and Problem solving basics
Python programming

Description

Hello and welcome to our course; Reinforcement Learning.

Reinforcement Learning is a very exciting and important field of Machine Learning and AI. Some call it the crown jewel of AI.

In this course, we will cover all the aspects related to Reinforcement Learning or RL. We will start by defining the RL problem, and compare it to the Supervised Learning problem, and discover the areas of applications where RL can excel. This includes the problem formulation, starting from the very basics to the advanced usage of Deep Learning, leading to the era of Deep Reinforcement Learning.

In our journey, we will cover, as usual, both the theoretical and practical aspects, where we will learn how to implement the RL algorithms and apply them to the famous problems using libraries like OpenAI Gym, Keras-RL, TensorFlow Agents or TF-Agents and Stable Baselines.

The course is divided into 6 main sections:

1- We start with an introduction to the RL problem definition, mainly comparing it to the Supervised learning problem, and discovering the application domains and the main constituents of an RL problem. We describe here the famous OpenAI Gym environments, which will be our playground when it comes to practical implementation of the algorithms that we learn about.

2- In the second part we discuss the main formulation of an RL problem as a Markov Decision Process or MDP, with simple solution to the most basic problems using Dynamic Programming.

3- After being armed with an understanding of MDP, we move on to explore the solution space of the MDP problem, and what the different solutions beyond DP, which includes model-based and model-free solutions. We will focus in this part on model-free solutions, and defer model-based solutions to the last part. In this part, we describe the Monte-Carlo and Temporal-Difference sampling based methods, including the famous and important Q-learning algorithm, and SARSA. We will describe the practical usage and implementation of Q-learning and SARSA on control tabular maze problems from OpenAI Gym environments.

4- To move beyond simple tabular problems, we will need to learn about function approximation in RL, which leads to the mainstream RL methods today using Deep Learning, or Deep Reinforcement Learning (DRL). We will describe here the breakthrough algorithm of DeepMind that solved the Atari games and AlphaGO, which is Deep Q-Networks or DQN. We also discuss how we can solve Atari games problems using DQN in practice using Keras-RL and TF-Agents.

5- In the fifth part, we move to Advanced DRL algorithms, mainly under a family called Policy based methods. We discuss here Policy Gradients, DDPG, Actor-Critic, A2C, A3C, TRPO and PPO methods. We also discuss the important Stable Baseline library to implement all those algorithms on different environments in OpenAI Gym, like Atari and others.

6- Finally, we explore the model-based family of RL methods, and importantly, differentiating model-based RL from planning, and exploring the whole spectrum of RL methods.

Hopefully, you enjoy this course, and find it useful.

Who this course is for:

Machine Learning Researchers
Machine Learning Engineers
Data Scientists

Reinforcement Learning (English): Master the Art of RL

What you'll learn

Explore related topics

Course content

Introduction2 lectures • 6min

Introduction to Reinforcement Learning15 lectures • 2hr 58min

Markov Decision Process (MDP)7 lectures • 1hr 25min

MDP solutions spaces13 lectures • 2hr 2min

Deep Reinforcement Learning (DRL)8 lectures • 1hr 6min

Advanced DRL14 lectures • 1hr 7min

Model-based Reinforcement Learning5 lectures • 25min

Conclusion1 lecture • 3min

Material1 lecture • 1min

Requirements

Description

Who this course is for: