
This video provides an overview of the entire course.
The aim of this video is to introduce Reinforcement Learning (RL) and illustrate RL concepts with a prototypical example.
Contrast RL with supervised and unsupervised learning
Introduce the classic RL Grid World problem or framework
Explain the RL concepts of states and actions, covering important RL concepts
The aim of this video is to demonstrate how to represent Grid World using the R software and to introduce the RL concepts of sequences of actions and randomness of actions.
Show how to represent (code) Grid World in R
Explain in detail the importance of the sequences of actions in achieving rewards
Show how to represent stochasticity or possible randomness in action behavior, covering all of the aim intentions
The aim is twofold: one isto probe more deeply into how the possible random execution of actions can affect the outcome, andthe second is to demonstrate that the specific reward structure can affect the optimal policy with regard to the best action.
Demonstrate how stochasticity affects ultimate action outcome
Examine the optimal policy in Grid World, given the different reward structures.
Show that small changes in reward structure matter!
The video deals with developing the optimal policy as a model-free solution to navigating a 2 x 2 grid.
Describe two different R packages for solving RL problems
Show RL state-action-reward framework
Demonstrate a hands-on extended R example to find an optimal policy
This video addresses the epsilon-greedy action selection strategy to update the optimal policy with a model-free solution to navigating a 2 x 2 grid.
Describe distinctions between exploration and exploitation action selection approaches
Describe the implementation of epsilon-greedy action selection strategy
Use another hands-on extended R example to update, or validate an optimal policy using an existing model
This video deals with using the R MDPtoolbox package to find the optimal policy solution for navigating a 2 x 2 grid.
Describe the Markov Decision Process framework for a Reinforcement Learning problem
Detail the probabilistic nature of the transition model
Demonstrate an MDPtoolbox R example to find the optimal policy
This video identifies and demonstrates several of the more important MDPtoolbox functions as pertinent to Reinforcement Learning problems.
Introduces several of the more important MDPtoolbox functions
Demonstrates what these MDPtoolbox functions do
Shows the input and output from each respective MDPtoolbox function
This video closes the loop on representing the 3 x 4 Grid World RL problem using R and without using any RL-specific R packages.
Show how to solve the original 3 x 4 Grid RL problem
Show how to construct a representative 3 x 4 Grid World environment
Demonstrate that this manualsolution produces a similar optimal policy
This video presents an end-of-Title user exercise, integrating much of the material presented in the three sections.
Frame a user exercise to reinforce learning this Title’s material
Provide the stub code to complete the user exercise
Challenge the user to build an appropriate environment in R
This video present a solution to the end-of-Title user exercise presented in the preceding video.
Detail the steps needed to solve the user exercise
Show how to build the appropriate R objects to complete the exercise
Demonstrate that the user solution produces the same optimal policy as before
This video will give you an overview about the course.
The aim of this video is to install RStudio.
• Download and install Base R
• Download and install RStudio
• Launch RStudio session
The aim of this video is to learn to install Python.
• Check your system for the current version of OS
• Download Python version 3
• Launch Python session
The aim of this video is to learn to work with Jupyter Notebook.
• Install Python 3 and upgrade to pip3
• Install IRKernel
• Launch Jupyter Notebook
The aim of this video is to study the learning type distinctions.
• What is supervised learning?
• What is unsupervised learning?
• Understand reinforcement learning
The aim of this video is to study reinforcement learning.
• Interpret artificial neural networks
• Understand deep learning
• Interpret perceptrons
The aim of this video is to study real-world reinforcement learning examples.
• Study a high level example
• Learn through a gaming example
The aim of this video is to learn about the key terms in reinforcement learning.
• Study in brief about the environment, agent, and state
• Get to know about policy, reward, sensor, and value
The aim of this video is to discuss about the OpenAI Gym.
• What is OpenAI Gym?
• Various environments in OpenAI Gym
• Learn to interface with OpenAI Gym
The aim of this video is to discuss about the Monte Carlo Method in brief.
• Study the Bandit problem
• What is a Bandit problem Pseudo Code?
• Memory concerns with Reinforcement Learning
The aim of this video is to discuss the Monte Carlo method in Python.
• Learn the goal of an mountain car example
• Perform Monte Carlo method example in Python
The aim of this video is to study the Monte Carlo method in R.
• Perform the Mountain car method example using Monte Carlo method in R
• Interpret the result
The aim of this video is to study the practical reinforcement learning in OpenAI Gym.
• Discuss the Value Iteration in R
• Study the Policy Iteration in R
• Get to know about the Bellman Equation in R
The aim of this video is to study about the different MDP concepts.
• Study the Markov Decision Process and Dynamic Programming
• What are the Bellman Equations
• Study about the Value and Policy Functions
The aim of this video is to study about the Python Library MDP Toolbox.
• Get to know in brief about the MDP Toolbox
• Work on the MDP Toolbox with the help of an example
The aim of this video is to discuss the value and policy iteration in Python.
• What is the Python MDP Toolbox
• Work on the Python MDP Toolbox with the help of an example
The aim of this video is to study the MDP Toolbox in R.
• Get to know in brief about the MDP Toolbox in R
• Work on the MDP Toolbox in R with the help of an example
The aim of this video is to discuss the value and policy iteration in R.
• Study about the Value Iteration in R
• Get to know about the Policy Iteration in R
• Learn about the Bellman Equation in R
The aim of this video study about temporal difference learning.
• What is temporal Difference Learning?
• Get to know about the Tabular TD(0) Pseudo Code
• Know about the SARSA, SARSA Pseudo Code, Q Learning and Q-Learning Pseudo Code
The aim of this video is learn to use the MDP Toolbox in Python to perform Q-Learning.
• Perform Q Learning in Python
• Interpret the results
The aim of this video is to study the Temporal Difference Learning in R
• Utilize the MDP Toolbox to do Q-Learning in R
• Perform Q Learning and One Step Temporal Difference in R
• Interpret and verify the results
This video provides an overview of the entire course.
How do you represent the environment when you have no explicit MDP model?
Determine the rules, “Physics,” structure of the state space
Determine the possible states, actions, new states, and rewards, and what you need to do once you have determined all of that
Build an environment function in R
How do you determine the optimal policy to “Solve” your reinforcement learning problem?
Observe State-Action-New-State reward experience data
Use this data to determine highest-value actions for each state
In this video, we will continue with the optimal policy to “Solve” your reinforcement learning problem.
Map high-value state-action pairs as optimal policy function
How does one validate the model, as well as validate (and possibly update) the previously-determined optimal policy?
Sample a new set of data from environment
Determine optimal policy function again, with the same model
Then compare the new policy function with the previous policy function
What are the state-value and state-action value functions?
Define the two value functions
Show how they impact policy evaluation and improvement
Illustrate with an R MDP example for moving a pawn
How do MDP problem parameters affect the optimal policy solution?
Introduction to the discount factor, “gamma”
Show how gamma affects policy moving a pawn
Show how other parameters affect policy moving a pawn
How gamma affects policy improvement and optimal policy determination by diving deeper into the nature of the discount factor, gamma?
Explain how the discount factor determines the value function
Show how the value function determines policy
Present an R example of discount and rewards affecting policy
What is the nature of the Monte Carlo Model-Free approach to solving Reinforcement Learning problems?
Describe the characteristics of the Monte Carlo approach
Describe random versus epsilon-greedy action selection
Illustrate with an R race-to-goal example
What is the nature of the Model-Free Q-Learning approach to solve Reinforcement Learning problems?
Describe Q-Learning as an off-policy learning concept
Walk through the Q-Learning update rule
Illustrate Q-Learning with an R example
Diving deeper into the nature of Q-Learning.
Look at effects of the learning rate parameter on Q-values
Look at effects of randomness of actions on policy
Illustrate effects of learning rate, random actions using R examples
Explore the characteristics of the SARSA algorithm.
Describe SARSA as an on-policy learning concept
Compare SARSA to the Model-Free Q-Learning approach
Note how SARSA is unique from Q-Learning
What is the nature of the Simulated Annealing algorithm alternative to Q-Learning?
Describe the characteristics of the Simulated Annealing approach
Describe probabilistic action selection derived from Boltzmann distribution metaheuristic
Illustrate with an R simulated annealing 2x2 grid example
How does one incorporate the discount factor into the previous Model-Free Q-Learning Reinforcement Learning algorithm?
Modify the Q-Learning algorithm to include a discount factor
Include the aggregation of rewards by episode
Illustrate modified Q-Learning algorithm with R examples
How does one demonstrate the effects of Q-Learning algorithm control parameters using effective visualizations?
Use the popular R ggplot2 package to create visualizations
Examine effects of epsilon, alpha, and gamma control parameters
Create color-based line plots of Q-values and rewards
Reinforcement Learning has become one of the hottest research areas in Machine Learning and Artificial Intelligence. You can make an intelligent agent in a few steps: have it semi-randomly explore different choices of movement to actions given different conditions and states, then keep track of the reward or penalty associated with each choice for a given state or action. This Course describes and compares the range of model-based and model-free learning algorithms that constitute Reinforcement Learning algorithms.
This comprehensive 3-in-1 course follows a step-by-step practical approach to getting grips with the basics of Reinforcement Learning with R and build your own intelligent systems. Initially, you’ll learn how to implement Reinforcement Learning techniques using the R programming language. You’ll also learn concepts and key algorithms in Reinforcement Learning. Moving further, you’ll dive into Temporal Difference Learning, an algorithm that combines Monte Carlo methods and dynamic programming. Finally, you’ll implement typical applications for model-based and model-free RL.
Towards the end of this course, you'll get to grips with the basics of Reinforcement Learning with R and build your own intelligent systems.
Contents and Overview
This training program includes 3 complete courses, carefully chosen to give you the most comprehensive training possible.
The first course, Reinforcement Learning Techniques with R, covers Reinforcement Learning techniques with R. This Course will give you a brief introduction to Reinforcement Learning; it will help you navigate the "Grid world" to calculate likely successful outcomes using the popular MDPToolbox package. This video will show you how the Stimulus - Action - Reward algorithm works in Reinforcement Learning. By the end of this Course, you will have a basic understanding of the concept of reinforcement learning, you will have compiled your first Reinforcement Learning program, and will have mastered programming the environment for Reinforcement Learning.
The second course, Practical Reinforcement Learning - Agents and Environments, covers concepts and Key Algorithms in Reinforcement Learning. In this course, you’ll learn how to code the core algorithms in RL and get to know the algorithms in both R and Python. This video course will help you hit the ground running, with R and Python code for Value Iteration, Policy Gradients, Q-Learning, Temporal Difference Learning, the Markov Decision Process, and Bellman Equations, which provides a framework for modelling decision making where outcomes are partly random and partly under the control of a decision maker. At the end of the video course, you’ll know the main concepts and key algorithms in RL.
The third course, Discover Algorithms for Reward-Based Learning in R, covers Model-Based and Model-Free RL Algorithms with R. The Course starts by describing the differences in model-free and model-based approaches to Reinforcement Learning. It discusses the characteristics, advantages and disadvantages, and typical examples of model-free and model-based approaches. We look at model-based approaches to Reinforcement Learning. We discuss State-value and State-action value functions, Model-based iterative policy evaluation, and improvement, MDP R examples of moving a pawn, how the discount factor, gamma, “works” and an R example illustrating how the discount factor and relative rewards affect policy. Next, we learn the model-free approach to Reinforcement Learning. This includes Monte Carlo approach, Q-Learning approach, More Q-Learning explanation and R examples of varying the learning rate and randomness of actions and SARSA approach. Finally, we round things up by taking a look at model-free Simulated Annealing and more Q-Learning algorithms. The primary aim is to learn how to create efficient, goal-oriented business policies, and how to evaluate and optimize those policies, primarily using the MDP toolbox package in R. Finally, the video shows how to build actions, rewards, and punishments with a simulated annealing approach.
Towards the end of this course, you'll get to grips with the basics of Reinforcement Learning with R and build your own intelligent systems.
About the Authors
Dr. Geoffrey Hubona held a full-time tenure-track, and tenured, assistant, and associate professor faculty positions at three major state universities in the Eastern United States from 1993-2010. In these positions, he taught dozens of various statistics, business information systems, and computer science courses to undergraduate, masters and Ph.D. students. Dr. Hubona earned a Ph.D. in Business Administration (Information Systems and Computer Science) from the University of South Florida (USF) in Tampa, FL (1993); an MA in Economics (1990), also from USF; an MBA in Finance (1979) from George Mason University in Fairfax, VA; and a BA in Psychology (1972) from the University of Virginia in Charlottesville, VA.
Lauren Washington is currently the Lead Data Scientist and Machine Learning Developer for smartQED , an AI-driven start-up. Lauren worked as a Data Scientist for Topix, Payments Risk Strategist for Google (Google Wallet/Android Pay), Statistical Analyst for Nielsen, and Big Data Intern for the National Opinion Research Center through the University of Chicago. Lauren is also passionate about teaching Machine Learning. She’s currently giving back to the data science community as a Thankful Data Science Bootcamp Mentor and a Packt Publishing technical video reviewer. She also earned a Data Science certificate from General Assembly San Francisco (2016), an MA in the Quantitative Methods in the Social Sciences (Applied Statistical Methods) from Columbia University (2012), and a BA in Economics from Spelman College (2010). Lauren is a leader in AI, in Silicon Valley, with a passion for knowledge gathering and sharing.