
This video will give you an overview about the course.
The aim of this video is to install RStudio.
Download and install Base R
Download and install RStudio
Launch RStudio session
The aim of this video is to learn to install Python.
Check your system for the current version of OS
Download Python version 3
Launch Python session
The aim of this video is to learn to work with Jupyter Notebook.
Install Python 3 and upgrade to pip3
Install IRKernel
Launch Jupyter Notebook
The aim of this video is to study the learning type distinctions.
What is supervised learning?
What is unsupervised learning?
Understand reinforcement learning
The aim of this video is to study reinforcement learning.
Interpret artificial neural networks
Understand deep learning
Interpret perceptrons
The aim of this video is to study real-world reinforcement learning examples.
Study a high level example
Learn through a gaming example
The aim of this video is to learn about the key terms in reinforcement learning.
Study in brief about the environment, agent, and state
Get to know about policy, reward, sensor, and value
The aim of this video is to discuss about the OpenAI Gym.
What is OpenAI Gym?
Various environments in OpenAI Gym
Learn to interface with OpenAI Gym
The aim of this video is to discuss about the Monte Carlo Method in brief.
Study the Bandit problem
What is a Bandit problem Pseudo Code?
Memory concerns with Reinforcement Learning
The aim of this video is to discuss the Monte Carlo method in Python.
Learn the goal of an mountain car example
Perform Monte Carlo method example in Python
The aim of this video is to study the Monte Carlo method in R.
Perform the Mountain car method example using Monte Carlo method in R
Interpret the result
The aim of this video is to study the practical reinforcement learning in OpenAI Gym.
Discuss the Value Iteration in R
Study the Policy Iteration in R
Get to know about the Bellman Equation in R
The aim of this video is to study about the different MDP concepts.
Study the Markov Decision Process and Dynamic Programming
What are the Bellman Equations
Study about the Value and Policy Functions
The aim of this video is to study about the Python Library MDP Toolbox.
Get to know in brief about the MDP Toolbox
Work on the MDP Toolbox with the help of an example
The aim of this video is to discuss the value and policy iteration in Python.
What is the Python MDP Toolbox
Work on the Python MDP Toolbox with the help of an example
The aim of this video is to study the MDP Toolbox in R.
Get to know in brief about the MDP Toolbox in R
Work on the MDP Toolbox in R with the help of an example
The aim of this video is to discuss the value and policy iteration in R.
Study about the Value Iteration in R
Get to know about the Policy Iteration in R
Learn about the Bellman Equation in R
The aim of this video study about temporal difference learning.
What is temporal Difference Learning?
Get to know about the Tabular TD(0) Pseudo Code
Know about the SARSA, SARSA Pseudo Code, Q Learning and Q-Learning Pseudo Code
The aim of this video is learn to use the MDP Toolbox in Python to perform Q-Learning.
Perform Q Learning in Python
Interpret the results
The aim of this video is to study the Temporal Difference Learning in R
Utilize the MDP Toolbox to do Q-Learning in R
Perform Q Learning and One Step Temporal Difference in R
Interpret and verify the results
This video provides an overview of the entire course.
This video aims to explore Deep Reinforcement Learning with TensorFlow, Keras, and prediction with neural networks.
Understand the purpose of Reinforcement Learning
Learn about Artificial Neural Network
Define Deep Learning and it’s frameworks
This video aims to give a brief introduction about Deep Q-Learning and Double Deep Q-Learning.
Deep Learning Overview
Define Deep Q-Learning
Learn about Double Deep Q-Learning
This video aims to demonstrate Q-Learning in Python.
Learn to use Q-Learning in Python
Demonstrate with a real-world example
This video aims to demonstrate Q-Learning in R.
Learn to use Q-Learning in R
Demonstrate with a real-world example
This video gives an overview to TensorFlow.
Define TensorFlow
Learn about the components of TensorFlow
Understand convolutional neural networks
This video aims to demonstrate TensorFlow in Python.
Learn to use TensorFlow in Python
Demonstrate with a real-world example
This video aims to demonstrate Deep Q-Learning with TensorFlow in Python.
Learn to use Deep Q-Learning with TensorFlow in Python
Demonstrate with a real-world example
This video gives an overview to Keras.
Define Keras
Benefits of Keras over TensorFlow
Learn about the components of Keras
This video aims to demonstrate Keras in Python.
Learn to use Keras in Python
Demonstrate with a real-world example
This video aims to demonstrate Deep Q-Learning with Keras in Python.
Learn to use Deep Q-Learning with Keras in Python
Demonstrate with a real-world example
This video aims to demonstrate Deep Q-Learning with Keras in Python.
Learn to use Deep Q-Learning with Keras in Python
Demonstrate with a real-world example
This video discusses the needs of Reinforcement Learning in the real world.
Benefits of Reinforcement Learning
Discuss the fields where Reinforcement Learning is used
This video provides an overview of the entire course
First discuss about - Why AI right now? After that we talk about – Why we use games to train our AI? And finally we see some use cases of AI.
Why AI right now?
Why we use games to train our AI?
some use cases of AI
First we will discuss about – What is Environment and Agent in RL? Then we talk about – How to give punishment and reward to the Agent? And finally we give some additional reading resources.
What is Environment and Agent in RL?
How to give punishment and reward to the Agent?
Additional reading resources.
Key concepts of Bellman Equation
Key concepts of Bellman Equation
First we talk about – What is Plan? Then we discuss about Deterministic vs Non-deterministic Search. And finally we try to understand Markov Process and Markov Decision Process
What is Plan?
Deterministic vs Non-deterministic Search
Markov Process and Markov Decision Process
First we talk about – What is Plan? Then we discuss about What is Policy? And finally we try to understand – What is Living Penalty?
What is Plan?
What is Policy?
What is Living Penalty?
Intuition behind Q-Learning
Intuition behind Q-Learning
Temporal Difference intuition
Temporal Difference intuition
Learning phase of Deep Q-Learning
Learning phase of Deep Q-Learning
Acting phase of Deep Q-Learning
Acting phase of Deep Q-Learning
First we talk about Experience Reply. Then we discuss about different types of Action Selection Policies
Experience Reply
Different types of Action Selection Policies
Install PYTORCH environment and Kivy sdk
Install PYTORCH environment
Install Kivy sdk
Create car.kv file
Create car.kv file
Create map.py file
Create map.py file
Create ai.py file
Create ai.py file
Play with our Self Driving Car AI
Play with our Self Driving Car AI
First we will discuss a bit history of CNN. Then we talk about Convolution operation and ReLU layer. And, finally we will know about Pooling, Flattening and Full connection
A bit history of CNN
Convolution operation and ReLU layer
Pooling, Flattening and Full connection
Intuition behind Deep Convolutional Q-Learning
Intuition behind Deep Convolutional Q-Learning
Intuition behind Eligibility Trace or n-step Q-Learning
Intuition behind Eligibility Trace or n-step Q-Learning
Installing OpenAIGym and ppaquette
Installing OpenAIGym
Installing ppaquette
Create image_preprocessing.py file
Create image_preprocessing.py file
Create experience_reply.py file
Create experience_reply.py file
Create ai.py file
Create ai.py file
Playing with our AI in DOOM
Playing with our AI in DOOM
This video will give you an overview about the course.
In this video, you will get to learn in detail the steps of setting up and installing the necessary software.
Know why the software is needed
Install the essential software
Outline the basics of TRFL with TD learning. How to use TRFL to perform trfl.td_learning? What arguments trfl.td_learning uses?
Define TD learning
Explain how to use TRFL in general and trfl.td_learning()
Review notebook implementing TD Learning with TRFL
Describe TRFL Q learning and off-policy methods. How trfl.qlearning is similar to trfl.td_learning and how the function usages differ.
Define Q learning and relate Q learning to TD learning
Build upon TRFL knowledge by using Q Learning
Discuss code for Q learning workbook to emphasize TRFL usage
A Double Q Network applies Q learning with a function approximator, experience replay, and target network updates. We describe and implement this Deep Reinforcement Learning algorithm. We show the benefits of TRFL’s Q learning and flexible target network updating methods in setting up the DQN.
Outline a DQN and the main parts of a DQN
Describe target network updating and TRFL usage
Implement a DQN in code with Tensorflow and TRFL
DQN serves as the base for a variety of Reinforcement Learning algorithms. TRFL can quickly and easily modify RL algorithms like DQN. We show how this is done with trfl.double_qlearning, trfl.persistent_qlearning, and trfl.huber_loss.
Describe the problem of approximation error and overestimation bias
Double Q learning and Persistent Q learning reduce approximation error
Show TRFL implementations of these DQN modifications
Distributional Q learning turns the scalar estimate of DQN into a distribution estimate. We further modify our DQN with trfl.categorical_dist_qlearning and trfl.categorical_dist_double_qlearning.
Explain categorical distributional Q learning
Go over TRFL usage of categorical distributional Q learning
Provide a code example of distributional Q learning using TRFL
Introduce policy gradients. Describe how policy gradients relate to value function methods covered in Section 1 and Section 2. Discuss continuous and discrete action spaces and the environment Lunar Lander. Implement policy gradient methods with TensorFlow and TRFL.
Define policy gradients advantages, theory, and usage
Describe TRFL usage of trfl.dicrete_policy_gradient and trfl.policy_gradient
Describe solving Lunar Lander with REINFORCE in TensorFlow with TRFL
Explain how policy entropy loss is used to improve policies by increasing exploration and preventing premature convergence. Walk through the formula and example usage.
Explain why we use entropy
Show the entropy formula
Demonstrate TRFL usage of policy entropy loss
Teach the Actor Critic model and how it combines policy gradients and value function methods. Discuss A3C and A2C as two methods that parallelize gradient collection from multiple Actor Critic agents.
Introduce Actor Critic definition and architecture
Discuss the A3C and A2C variants of Actor Critic
Show TensorFlow Actor Critic model and TRFL A2C loss
Relate deterministic policy gradients (DPG) to policy gradients. Explain how DPG differ and introduce deep deterministic policy gradients (DDPG). Go over DDPG TensorFlow network architecture TRFL usage.
Theory and definition of deterministic policy gradients
How DDPG incorporate DPG into deep RL
TensorFlow implementation of DDPG with TRFL to solve LunarLander
Credit assignment is an important topic in RL. We discuss various methods for credit assignment leading to TD(λ). TD(λ) relates TD learning to Monte Carlo updates and allows multi-step returns.
Introduce credit assignment issue and various methods of combatting
Describe TD(λ) in the context of those methods
Use TRFL and TD(λ) in FrozenLake
The bias-variance trade-off is something to consider when selecting an algorithm. Generalized Advantage Estimation (GAE) helps balance the trade-off. TRFL can implement GAE with trfl.generalized_lambda_returns() which is closely related to TD(λ).
Discuss bias, variance, and GAE
How trfl.generalized_lambda_returns() can implement GAE
Use TRFL and policy gradient methods with trfl.generalized_lambda_returns()
Q(λ) builds upon TD(λ) much like TD learning builds upon Q learning: by turning state value estimates into action value estimates. We show the uses of multi-step action value estimates provided by Q(λ) and describe some of the variants of Q(λ).
Contrast Q(λ) with TD(λ)
Implement Q(λ) in TRFL
Use trfl.qlambda() to solve FrozenLake with Q lambda
Multistep Forward View is called by all the TRFL functions covered in this section. We see how trfl.multistep_forward_view() can generalize λ methods and allow flexible implementation of other λ methods.
Explain how trfl.multistep_forward_view() is called
Expand upon the flexibility of trfl.multistep_forward_view()
Implement the Q(λ) variant Watkins’ Q(λ) trfl.multistep_forward_view() on FrozenLake
Retrace(λ) corrects for the off-policy-ness of data. Retrace(λ) reduces variance and better uses off-policy and on-policy returns than prior λ methods. Retrace(λ) builds upon TD(λ) and importance sampling.
Define importance sampling
Describe Retrace(λ) and its benefits
Use Retrace(λ) in TRFL with Taxi environment
IMPALA is a state-of-the-art algorithm produced by DeepMind. IMPALA communicates off-policy trajectories using a variety of methods to speed up calculations. IMPALA’s unique architecture combined with V-trace drives IMPALA’s performance.
Overview of the main parts of IMPALA
Importance of V-trace and relation to Retrace(λ)
Implement V-trace in TRFL
UNREAL is a DeepMind Reinforcement Learning algorithm that combines an A3C with a LSTM and a variety of auxiliary reward functions. UNREAL combines on and off-policy methods and uses auxiliary tasks like pixel control.
Description of UNREAL’s unique characteristics
Detail auxiliary tasks like pixel control
Use pixel control in TRFL to play Pong
Reinforcement Learning (RL), allows you to develop smart, quick and self-learning systems in your business surroundings. It is an effective method to train your learning agents and solve a variety of problems in Artificial Intelligence—from games, self-driving cars and robots to enterprise applications that range from data centre energy saving (cooling data centres) to smart warehousing solutions.
This course covers the major advancements and successes achieved in deep reinforcement learning by synergizing deep neural network architectures with reinforcement learning. You will be introduced to the concept of Reinforcement Learning, its advantages and why it's gaining so much popularity. This course also discusses on Markov Decision Process (MDPs), Monte Carlo tree searches, dynamic programmings such as policy and value iteration, temporal difference learning such as Q-learning and SARSA. You will learn to build convolutional neural network models using TensorFlow and Keras. You will also learn the use of artificial intelligence in a gaming environment with the help of OpenAI Gym.
By the end of this course, you will explore reinforcement learning and will have hands-on experience with real data and artificial intelligence (AI) to build intelligent systems.
Meet Your Expert(s):
We have the best work of the following esteemed author(s) to ensure that your learning journey is smooth:
● Lauren Washington is currently the Lead Data Scientist and Machine Learning Developer for smartQED, an AI driven start-up. Lauren worked as a Data Scientist for Topix, Payments Risk Strategist for Google (Google Wallet/Android Pay), Statistical Analyst for Nielsen, and Big Data Intern for the National Opinion Research Center through the University of Chicago. Lauren is also passionate about teaching Machine Learning. She’s currently giving back to the data science community as a Thinkful Data Science Bootcamp Mentor and a Packt Publishing technical video reviewer. She also earned a Data Science certificate from General Assembly San Francisco (2016), a MA in the Quantitative Methods in the Social Sciences (Applied Statistical Methods) from Columbia University (2012), and a BA in Economics from Spelman College (2010). Lauren is a leader in AI, in Silicon Valley, with a passion for knowledge gathering and sharing.
● Kaiser Hamid Rabbi is a Data Scientist who is super-passionate about Artificial Intelligence, Machine Learning, and Data Science. He has entirely devoted himself to learning more about Big Data Science technologies such as Python, Machine Learning, Deep Learning, Artificial Intelligence, Reinforcement Learning, Data Mining, Data Analysis, Recommender Systems and so on over the last 4 years. Kaiser also has a huge interest in Lygometry (things we know we do not know!) and always tries to understand domain knowledge based on his project experience as much as possible.
● Colibri Digital is a technology consultancy company founded in 2015 by James Cross and Ingrid Funie. The company works to help its clients navigate the rapidly changing and complex world of emerging technologies, with deep expertise in areas such as big data, data science, machine learning, and Cloud computing. Over the past few years, they have worked with some of the World's largest and most prestigious companies, including a tier 1 investment bank, a leading management consultancy group, and one of the World's most popular soft drinks companies, helping each of them to better make sense of its data, and process it in more intelligent ways. The company lives by its motto: Data -> Intelligence -> Action.
● Jim DiLorenzo is a freelance programmer and Reinforcement Learning enthusiast. He graduated from Columbia University and is working on his Masters in Computer Science. He has used TRFL in his own RL experiments and when implementing scientific papers into code.