Python Reinforcement Learning, Deep Q-Learning and TRFL

Name: Python Reinforcement Learning, Deep Q-Learning and TRFL
Rating: 3.0 (15 reviews)

Leverage the power of Reinforcement Learning techniques to develop intelligent systems using Python

Created byPackt Publishing

Last updated 5/2019

English

What you'll learn

Implement state-of-the-art Reinforcement Learning algorithms from the basics
Discover various techniques of Reinforcement Learning such as MDP, Q Learning, and more
Dive into Temporal Difference Learning, an algorithm that combines Monte Carlo methods and dynamic programming
Create a virtual Self Driving Car application with Deep Q-Learning
Teach a Reinforcement Learning model to play a game using TensorFlow and the OpenAI gym
Build projects with TRFL and TensorFlow and integrate essential RL building blocks into existing code
Discover improvements to RL algorithms such as DQN and DDPG with TRFL blocks—for example, advanced target network updating, Double Q Learning, and Distributional Q Learning
Modify RL agents to include multistep reward techniques such as TD lambda
Create TRFL-based RL agents with classic RL methods such as TD Learning, Q Learning, and SARSA

Course content

4 sections • 77 lectures • 5h 13m total length

The Course Overview3:32
This video will give you an overview about the course.
Install RStudio2:40
The aim of this video is to install RStudio.
Download and install Base R
Download and install RStudio
Launch RStudio session
Install Python1:47
The aim of this video is to learn to install Python.
Check your system for the current version of OS
Download Python version 3
Launch Python session
Launch Jupyter Notebook3:38
The aim of this video is to learn to work with Jupyter Notebook.
Install Python 3 and upgrade to pip3
Install IRKernel
Launch Jupyter Notebook
Learning Type Distinctions2:25
The aim of this video is to study the learning type distinctions.
What is supervised learning?
What is unsupervised learning?
Understand reinforcement learning
Get Started with Reinforcement Learning2:42
The aim of this video is to study reinforcement learning.
Interpret artificial neural networks
Understand deep learning
Interpret perceptrons
Real-world Reinforcement Learning Examples2:13
The aim of this video is to study real-world reinforcement learning examples.
Study a high level example
Learn through a gaming example
Key Terms in Reinforcement Learning4:11
The aim of this video is to learn about the key terms in reinforcement learning.
Study in brief about the environment, agent, and state
Get to know about policy, reward, sensor, and value
OpenAI Gym3:53
The aim of this video is to discuss about the OpenAI Gym.
What is OpenAI Gym?
Various environments in OpenAI Gym
Learn to interface with OpenAI Gym
Monte Carlo Method5:55
The aim of this video is to discuss about the Monte Carlo Method in brief.
Study the Bandit problem
What is a Bandit problem Pseudo Code?
Memory concerns with Reinforcement Learning
Monte Carlo Method in Python2:18
The aim of this video is to discuss the Monte Carlo method in Python.
Learn the goal of an mountain car example
Perform Monte Carlo method example in Python
Monte Carlo Method in R3:08
The aim of this video is to study the Monte Carlo method in R.
Perform the Mountain car method example using Monte Carlo method in R
Interpret the result
Practical Reinforcement Learning in OpenAI Gym1:58
The aim of this video is to study the practical reinforcement learning in OpenAI Gym.
Discuss the Value Iteration in R
Study the Policy Iteration in R
Get to know about the Bellman Equation in R
Markov Decision Process Concepts7:44
The aim of this video is to study about the different MDP concepts.
Study the Markov Decision Process and Dynamic Programming
What are the Bellman Equations
Study about the Value and Policy Functions
Python MDP Toolbox6:41
The aim of this video is to study about the Python Library MDP Toolbox.
Get to know in brief about the MDP Toolbox
Work on the MDP Toolbox with the help of an example
Value and Policy Iteration in Python3:32
The aim of this video is to discuss the value and policy iteration in Python.
What is the Python MDP Toolbox
Work on the Python MDP Toolbox with the help of an example
MDP Toolbox in R2:49
The aim of this video is to study the MDP Toolbox in R.
Get to know in brief about the MDP Toolbox in R
Work on the MDP Toolbox in R with the help of an example
Value Iteration and Policy Iteration in R3:10
The aim of this video is to discuss the value and policy iteration in R.
Study about the Value Iteration in R
Get to know about the Policy Iteration in R
Learn about the Bellman Equation in R
Temporal Difference Learning8:23
The aim of this video study about temporal difference learning.
What is temporal Difference Learning?
Get to know about the Tabular TD(0) Pseudo Code
Know about the SARSA, SARSA Pseudo Code, Q Learning and Q-Learning Pseudo Code
Temporal Difference Learning in Python1:53
The aim of this video is learn to use the MDP Toolbox in Python to perform Q-Learning.
Perform Q Learning in Python
Interpret the results
Temporal Difference Learning in R2:54
The aim of this video is to study the Temporal Difference Learning in R
Utilize the MDP Toolbox to do Q-Learning in R
Perform Q Learning and One Step Temporal Difference in R
Interpret and verify the results
Test Your Knowledge

The Course Overview2:23
This video provides an overview of the entire course.
Introduction to Deep Reinforcement Learning4:53
This video aims to explore Deep Reinforcement Learning with TensorFlow, Keras, and prediction with neural networks.
Understand the purpose of Reinforcement Learning
Learn about Artificial Neural Network
Define Deep Learning and it’s frameworks
Deep Q-Learning and Double Deep Q-Learning1:19
This video aims to give a brief introduction about Deep Q-Learning and Double Deep Q-Learning.
Deep Learning Overview
Define Deep Q-Learning
Learn about Double Deep Q-Learning
Q-Learning in Python3:35
This video aims to demonstrate Q-Learning in Python.
Learn to use Q-Learning in Python
Demonstrate with a real-world example
Q-Learning in R2:58
This video aims to demonstrate Q-Learning in R.
Learn to use Q-Learning in R
Demonstrate with a real-world example
TensorFlow2:33
This video gives an overview to TensorFlow.
Define TensorFlow
Learn about the components of TensorFlow
Understand convolutional neural networks
TensorFlow in Python3:51
This video aims to demonstrate TensorFlow in Python.
Learn to use TensorFlow in Python
Demonstrate with a real-world example
Deep Q-Learning with TensorFlow in Python7:42
This video aims to demonstrate Deep Q-Learning with TensorFlow in Python.
Learn to use Deep Q-Learning with TensorFlow in Python
Demonstrate with a real-world example
Keras1:47
This video gives an overview to Keras.
Define Keras
Benefits of Keras over TensorFlow
Learn about the components of Keras
Keras in Python2:26
This video aims to demonstrate Keras in Python.
Learn to use Keras in Python
Demonstrate with a real-world example
Deep Q-Learning with Keras in Python3:52
This video aims to demonstrate Deep Q-Learning with Keras in Python.
Learn to use Deep Q-Learning with Keras in Python
Demonstrate with a real-world example
Deep Q-Learning with Keras in R3:00
This video aims to demonstrate Deep Q-Learning with Keras in Python.
Learn to use Deep Q-Learning with Keras in Python
Demonstrate with a real-world example
Case Study – Reinforcement Learning3:00
This video discusses the needs of Reinforcement Learning in the real world.
Benefits of Reinforcement Learning
Discuss the fields where Reinforcement Learning is used
Test Your Knowledge

The Course Overview3:42
This video provides an overview of the entire course
Artificial Intelligence in a Nutshell4:24
First discuss about - Why AI right now? After that we talk about – Why we use games to train our AI? And finally we see some use cases of AI.
Why AI right now?
Why we use games to train our AI?
some use cases of AI
Reinforcement Learning Dynamics6:41
First we will discuss about – What is Environment and Agent in RL? Then we talk about – How to give punishment and reward to the Agent? And finally we give some additional reading resources.
What is Environment and Agent in RL?
How to give punishment and reward to the Agent?
Additional reading resources.
The Bellman Equation6:31
Key concepts of Bellman Equation
Key concepts of Bellman Equation
Markov Decision Process5:48
First we talk about – What is Plan? Then we discuss about Deterministic vs Non-deterministic Search. And finally we try to understand Markov Process and Markov Decision Process
What is Plan?
Deterministic vs Non-deterministic Search
Markov Process and Markov Decision Process
Policy versus Plan and Living Penalty8:17
First we talk about – What is Plan? Then we discuss about What is Policy? And finally we try to understand – What is Living Penalty?
What is Plan?
What is Policy?
What is Living Penalty?
Q-Learning Intuition4:39
Intuition behind Q-Learning
Intuition behind Q-Learning
Temporal Difference5:19
Temporal Difference intuition
Temporal Difference intuition
Learning Phase of Deep Q-Learning2:53
Learning phase of Deep Q-Learning
Learning phase of Deep Q-Learning
Acting Phase of Deep Q-Learning2:33
Acting phase of Deep Q-Learning
Acting phase of Deep Q-Learning
Experience Reply and Action Selection Policies7:29
First we talk about Experience Reply. Then we discuss about different types of Action Selection Policies
Experience Reply
Different types of Action Selection Policies
Installing PYTORCH environment4:15
Install PYTORCH environment and Kivy sdk
Install PYTORCH environment
Install Kivy sdk
Self Driving Car – Part 11:27
Create car.kv file
Create car.kv file
Self Driving Car – Part 24:12
Create map.py file
Create map.py file
Self Driving Car – Part 38:24
Create ai.py file
Create ai.py file
Playing with Our SDC AI3:33
Play with our Self Driving Car AI
Play with our Self Driving Car AI
Convolutional Neural Network10:09
First we will discuss a bit history of CNN. Then we talk about Convolution operation and ReLU layer. And, finally we will know about Pooling, Flattening and Full connection
A bit history of CNN
Convolution operation and ReLU layer
Pooling, Flattening and Full connection
Deep Convolutional Q-Learning3:03
Intuition behind Deep Convolutional Q-Learning
Intuition behind Deep Convolutional Q-Learning
Eligibility Trace4:36
Intuition behind Eligibility Trace or n-step Q-Learning
Intuition behind Eligibility Trace or n-step Q-Learning
Installing OpenAIGym and ppaquette2:12
Installing OpenAIGym and ppaquette
Installing OpenAIGym
Installing ppaquette
Build an AI for DOOM – Part 13:11
Create image_preprocessing.py file
Create image_preprocessing.py file
Build an AI for DOOM – Part 21:42
Create experience_reply.py file
Create experience_reply.py file
Build an AI for DOOM – Part 35:19
Create ai.py file
Create ai.py file
Playing with our AI in DOOM3:27
Playing with our AI in DOOM
Playing with our AI in DOOM
Test Your Knowledge

The Course Overview3:01
This video will give you an overview about the course.
Set Up and Installation2:16
In this video, you will get to learn in detail the steps of setting up and installing the necessary software.
Know why the software is needed
Install the essential software
Getting Started with TD Learning4:26
Outline the basics of TRFL with TD learning. How to use TRFL to perform trfl.td_learning? What arguments trfl.td_learning uses?
Define TD learning
Explain how to use TRFL in general and trfl.td_learning()
Review notebook implementing TD Learning with TRFL
Exploiting Off-policy Efficiency Using Q Learning3:34
Describe TRFL Q learning and off-policy methods. How trfl.qlearning is similar to trfl.td_learning and how the function usages differ.
Define Q learning and relate Q learning to TD learning
Build upon TRFL knowledge by using Q Learning
Discuss code for Q learning workbook to emphasize TRFL usage
Comparing On-policy Methods with SARSA and SARSE3:40
Implementing a Deep Q Network and Applying Target Network Updates8:09
A Double Q Network applies Q learning with a function approximator, experience replay, and target network updates. We describe and implement this Deep Reinforcement Learning algorithm. We show the benefits of TRFL’s Q learning and flexible target network updating methods in setting up the DQN.
Outline a DQN and the main parts of a DQN
Describe target network updating and TRFL usage
Implement a DQN in code with Tensorflow and TRFL
Modifying a DQN with Double DQN, Persistent DQN, and Huber Loss4:11
DQN serves as the base for a variety of Reinforcement Learning algorithms. TRFL can quickly and easily modify RL algorithms like DQN. We show how this is done with trfl.double_qlearning, trfl.persistent_qlearning, and trfl.huber_loss.
Describe the problem of approximation error and overestimation bias
Double Q learning and Persistent Q learning reduce approximation error
Show TRFL implementations of these DQN modifications
Improving a DQN with Distributional Q Learning5:12
Distributional Q learning turns the scalar estimate of DQN into a distribution estimate. We further modify our DQN with trfl.categorical_dist_qlearning and trfl.categorical_dist_double_qlearning.
Explain categorical distributional Q learning
Go over TRFL usage of categorical distributional Q learning
Provide a code example of distributional Q learning using TRFL
Utilizing Policy Gradient Methods6:49
Introduce policy gradients. Describe how policy gradients relate to value function methods covered in Section 1 and Section 2. Discuss continuous and discrete action spaces and the environment Lunar Lander. Implement policy gradient methods with TensorFlow and TRFL.
Define policy gradients advantages, theory, and usage
Describe TRFL usage of trfl.dicrete_policy_gradient and trfl.policy_gradient
Describe solving Lunar Lander with REINFORCE in TensorFlow with TRFL
Increasing Exploration with Policy Entropy Loss5:22
Explain how policy entropy loss is used to improve policies by increasing exploration and preventing premature convergence. Walk through the formula and example usage.
Explain why we use entropy
Show the entropy formula
Demonstrate TRFL usage of policy entropy loss
Applying Actor-Critic with A3C and A2C4:46
Teach the Actor Critic model and how it combines policy gradients and value function methods. Discuss A3C and A2C as two methods that parallelize gradient collection from multiple Actor Critic agents.
Introduce Actor Critic definition and architecture
Discuss the A3C and A2C variants of Actor Critic
Show TensorFlow Actor Critic model and TRFL A2C loss
Performing Deterministic Policy Gradients4:27
Relate deterministic policy gradients (DPG) to policy gradients. Explain how DPG differ and introduce deep deterministic policy gradients (DDPG). Go over DDPG TensorFlow network architecture TRFL usage.
Theory and definition of deterministic policy gradients
How DDPG incorporate DPG into deep RL
TensorFlow implementation of DDPG with TRFL to solve LunarLander
Deploying TD(λ)4:16
Credit assignment is an important topic in RL. We discuss various methods for credit assignment leading to TD(λ). TD(λ) relates TD learning to Monte Carlo updates and allows multi-step returns.
Introduce credit assignment issue and various methods of combatting
Describe TD(λ) in the context of those methods
Use TRFL and TD(λ) in FrozenLake
Balancing Bias and Variance with Generalized λ Returns2:59
The bias-variance trade-off is something to consider when selecting an algorithm. Generalized Advantage Estimation (GAE) helps balance the trade-off. TRFL can implement GAE with trfl.generalized_lambda_returns() which is closely related to TD(λ).
Discuss bias, variance, and GAE
How trfl.generalized_lambda_returns() can implement GAE
Use TRFL and policy gradient methods with trfl.generalized_lambda_returns()
Applying Q(λ)2:28
Q(λ) builds upon TD(λ) much like TD learning builds upon Q learning: by turning state value estimates into action value estimates. We show the uses of multi-step action value estimates provided by Q(λ) and describe some of the variants of Q(λ).
Contrast Q(λ) with TD(λ)
Implement Q(λ) in TRFL
Use trfl.qlambda() to solve FrozenLake with Q lambda
Working with Multi-step Forward View2:14
Multistep Forward View is called by all the TRFL functions covered in this section. We see how trfl.multistep_forward_view() can generalize λ methods and allow flexible implementation of other λ methods.
Explain how trfl.multistep_forward_view() is called
Expand upon the flexibility of trfl.multistep_forward_view()
Implement the Q(λ) variant Watkins’ Q(λ) trfl.multistep_forward_view() on FrozenLake
Using Importance Sampling with Retrace (λ)3:19
Retrace(λ) corrects for the off-policy-ness of data. Retrace(λ) reduces variance and better uses off-policy and on-policy returns than prior λ methods. Retrace(λ) builds upon TD(λ) and importance sampling.
Define importance sampling
Describe Retrace(λ) and its benefits
Use Retrace(λ) in TRFL with Taxi environment
Getting Started with Impala with V-Trace3:29
IMPALA is a state-of-the-art algorithm produced by DeepMind. IMPALA communicates off-policy trajectories using a variety of methods to speed up calculations. IMPALA’s unique architecture combined with V-trace drives IMPALA’s performance.
Overview of the main parts of IMPALA
Importance of V-trace and relation to Retrace(λ)
Implement V-trace in TRFL
Augmenting an Agent with Unreal and Pixel Control4:14
UNREAL is a DeepMind Reinforcement Learning algorithm that combines an A3C with a LSTM and a variety of auxiliary reward functions. UNREAL combines on and off-policy methods and uses auxiliary tasks like pixel control.
Description of UNREAL’s unique characteristics
Detail auxiliary tasks like pixel control
Use pixel control in TRFL to play Pong
Test Your Knowledge

Requirements

Basic knowledge of Python is required.

Description

Reinforcement Learning (RL), allows you to develop smart, quick and self-learning systems in your business surroundings. It is an effective method to train your learning agents and solve a variety of problems in Artificial Intelligence—from games, self-driving cars and robots to enterprise applications that range from data centre energy saving (cooling data centres) to smart warehousing solutions.

This course covers the major advancements and successes achieved in deep reinforcement learning by synergizing deep neural network architectures with reinforcement learning. You will be introduced to the concept of Reinforcement Learning, its advantages and why it's gaining so much popularity. This course also discusses on Markov Decision Process (MDPs), Monte Carlo tree searches, dynamic programmings such as policy and value iteration, temporal difference learning such as Q-learning and SARSA. You will learn to build convolutional neural network models using TensorFlow and Keras. You will also learn the use of artificial intelligence in a gaming environment with the help of OpenAI Gym.

By the end of this course, you will explore reinforcement learning and will have hands-on experience with real data and artificial intelligence (AI) to build intelligent systems.

Meet Your Expert(s):

We have the best work of the following esteemed author(s) to ensure that your learning journey is smooth:

● Lauren Washington is currently the Lead Data Scientist and Machine Learning Developer for smartQED, an AI driven start-up. Lauren worked as a Data Scientist for Topix, Payments Risk Strategist for Google (Google Wallet/Android Pay), Statistical Analyst for Nielsen, and Big Data Intern for the National Opinion Research Center through the University of Chicago. Lauren is also passionate about teaching Machine Learning. She’s currently giving back to the data science community as a Thinkful Data Science Bootcamp Mentor and a Packt Publishing technical video reviewer. She also earned a Data Science certificate from General Assembly San Francisco (2016), a MA in the Quantitative Methods in the Social Sciences (Applied Statistical Methods) from Columbia University (2012), and a BA in Economics from Spelman College (2010). Lauren is a leader in AI, in Silicon Valley, with a passion for knowledge gathering and sharing.

● Kaiser Hamid Rabbi is a Data Scientist who is super-passionate about Artificial Intelligence, Machine Learning, and Data Science. He has entirely devoted himself to learning more about Big Data Science technologies such as Python, Machine Learning, Deep Learning, Artificial Intelligence, Reinforcement Learning, Data Mining, Data Analysis, Recommender Systems and so on over the last 4 years. Kaiser also has a huge interest in Lygometry (things we know we do not know!) and always tries to understand domain knowledge based on his project experience as much as possible.

● Colibri Digital is a technology consultancy company founded in 2015 by James Cross and Ingrid Funie. The company works to help its clients navigate the rapidly changing and complex world of emerging technologies, with deep expertise in areas such as big data, data science, machine learning, and Cloud computing. Over the past few years, they have worked with some of the World's largest and most prestigious companies, including a tier 1 investment bank, a leading management consultancy group, and one of the World's most popular soft drinks companies, helping each of them to better make sense of its data, and process it in more intelligent ways. The company lives by its motto: Data -> Intelligence -> Action.

● Jim DiLorenzo is a freelance programmer and Reinforcement Learning enthusiast. He graduated from Columbia University and is working on his Masters in Computer Science. He has used TRFL in his own RL experiments and when implementing scientific papers into code.

Who this course is for:

This course is designed for AI engineers, Machine Learning engineers, aspiring Reinforcement Learning and Data Science professionals keen to extend their skill set to Reinforcement Learning using Python.

Python Reinforcement Learning, Deep Q-Learning and TRFL

What you'll learn

Explore related topics

Course content

Practical Reinforcement Learning - Agents and Environments21 lectures • 1hr 17min

Advanced Practical Reinforcement Learning13 lectures • 43min

Hands-On Deep Q-Learning24 lectures • 1hr 54min

Reinforcement Learning with TensorFlow & TRFL19 lectures • 1hr 19min

Requirements

Description

Who this course is for: