Reinforcement Learning with R: Algorithms-Agents-Environment
4.7 (2 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
37 students enrolled

Reinforcement Learning with R: Algorithms-Agents-Environment

Learn how to utilize algorithms for reward-based learning, as part of Reinforcement Learning with R.
4.7 (2 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
37 students enrolled
Created by Packt Publishing
Last updated 3/2019
English
English [Auto-generated]
Current price: $139.99 Original price: $199.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 6 hours on-demand video
  • 1 downloadable resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Understand and Implement the "Grid World" Problem in R
  • Utilize the Markov Decision Process and Bellman equations
  • Get to know the key terms in Reinforcement Learning
  • Dive into Temporal Difference Learning, an algorithm that combines Monte Carlo methods and dynamic programming
  • Take your Machine Learning skills to the next level with RL techniques
  • Learn R examples of policy evaluation and iteration
  • Implement typical applications for model-based and model-free RL
  • Understand policy evaluation and iteration
  • Master Q-Learning with Greedy Selection Examples in R
  • Master the Simulated Annealing Changed Discount Factor through examples in R
Course content
Expand all 47 lectures 06:14:29
+ Reinforcement Learning Techniques with R
11 lectures 02:21:02

This video provides an overview of the entire course.

Preview 03:57

The aim of this video is to introduce Reinforcement Learning (RL) and illustrate RL concepts with a prototypical example.

  • Contrast RL with supervised and unsupervised learning

  • Introduce the classic RL Grid World problem or framework

  • Explain the RL concepts of states and actions, covering important RL concepts

Understanding the RL “Grid World” Problem
07:06

The aim of this video is to demonstrate how to represent Grid World using the R software and to introduce the RL concepts of sequences of actions and randomness of actions.

  • Show how to represent (code) Grid World in R

  • Explain in detail the importance of the sequences of actions in achieving rewards

  • Show how to represent stochasticity or possible randomness in action behavior, covering all of the aim intentions

Implementing the Grid World Framework in R
16:52

The aim is twofold: one isto probe more deeply into how the possible random execution of actions can affect the outcome, andthe second is to demonstrate that the specific reward structure can affect the optimal policy with regard to the best action.

  • Demonstrate how stochasticity affects ultimate action outcome

  • Examine the optimal policy in Grid World, given the different reward structures.

  • Show that small changes in reward structure matter!

Navigating Grid World and Calculating Likely Successful Outcomes
15:34

The video deals with developing the optimal policy as a model-free solution to navigating a 2 x 2 grid.

  • Describe two different R packages for solving RL problems

  • Show RL state-action-reward framework

  • Demonstrate a hands-on extended R example to find an optimal policy

R Example – Finding Optimal Policy Navigating 2 x 2 Grid
26:10

This video addresses the epsilon-greedy action selection strategy to update the optimal policy with a model-free solution to navigating a 2 x 2 grid.

  • Describe distinctions between exploration and exploitation action selection approaches

  • Describe the implementation of epsilon-greedy action selection strategy

  • Use another hands-on extended R example to update, or validate an optimal policy using an existing model

R Example – Updating Optimal Policy Navigating 2 x 2 Grid
19:06

This video deals with using the R MDPtoolbox package to find the optimal policy solution for navigating a 2 x 2 grid.

  • Describe the Markov Decision Process framework for a Reinforcement Learning problem

  • Detail the probabilistic nature of the transition model

  • Demonstrate an MDPtoolbox R example to find the optimal policy

R Example – MDPtoolbox Solution Navigating 2 x 2 Grid
17:51

This video identifies and demonstrates several of the more important MDPtoolbox functions as pertinent to Reinforcement Learning problems.

  • Introduces several of the more important MDPtoolbox functions

  • Demonstrates what these MDPtoolbox functions do

  • Shows the input and output from each respective MDPtoolbox function

More MDPtoolbox Function Examples Using R
10:14

This video closes the loop on representing the 3 x 4 Grid World RL problem using R and without using any RL-specific R packages.

  • Show how to solve the original 3 x 4 Grid RL problem

  • Show how to construct a representative 3 x 4 Grid World environment

  • Demonstrate that this manualsolution produces a similar optimal policy

R Example – Finding Optimal 3 x 4 Grid World Policy
10:30

This video presents an end-of-Title user exercise, integrating much of the material presented in the three sections.

  • Frame a user exercise to reinforce learning this Title’s material

  • Provide the stub code to complete the user exercise

  • Challenge the user to build an appropriate environment in R

R Exercise – Building a 3 x 4 Grid World Environment
03:27

This video present a solution to the end-of-Title user exercise presented in the preceding video.

  • Detail the steps needed to solve the user exercise

  • Show how to build the appropriate R objects to complete the exercise

  • Demonstrate that the user solution produces the same optimal policy as before

R Exercise Solution – Building a 3 x 4 Grid World Environment
10:15
+ Practical Reinforcement Learning - Agents and Environments
21 lectures 01:17:26

This video will give you an overview about the course.

Preview 03:32

The aim of this video is to install RStudio.

   •  Download and install Base R

   •  Download and install RStudio

   •  Launch RStudio session

Install RStudio
02:40

The aim of this video is to learn to install Python.

   •  Check your system for the current version of OS

   •  Download Python version 3

   •  Launch Python session

Install Python
01:47

The aim of this video is to learn to work with Jupyter Notebook.

   •  Install Python 3 and upgrade to pip3

   •  Install IRKernel

   •  Launch Jupyter Notebook

Launch Jupyter Notebook
03:38

The aim of this video is to study the learning type distinctions.

   •  What is supervised learning?

   •  What is unsupervised learning?

   •  Understand reinforcement learning

Learning Type Distinctions
02:25

The aim of this video is to study reinforcement learning.

   •  Interpret artificial neural networks

   •  Understand deep learning

   •  Interpret perceptrons

Get Started with Reinforcement Learning
02:42

The aim of this video is to study real-world reinforcement learning examples.

   •  Study a high level example

   •  Learn through a gaming example

Real-world Reinforcement Learning Examples
02:13

The aim of this video is to learn about the key terms in reinforcement learning.

   •  Study in brief about the environment, agent, and state

   •  Get to know about policy, reward, sensor, and value

Key Terms in Reinforcement Learning
04:11

The aim of this video is to discuss about the OpenAI Gym.

   •  What is OpenAI Gym?

   •  Various environments in OpenAI Gym

   •  Learn to interface with OpenAI Gym

OpenAI Gym
03:53

The aim of this video is to discuss about the Monte Carlo Method in brief.

   •  Study the Bandit problem

   •  What is a Bandit problem Pseudo Code?

   •  Memory concerns with Reinforcement Learning

Monte Carlo Method
05:55

The aim of this video is to discuss the Monte Carlo method in Python.

   •  Learn the goal of an mountain car example

   •  Perform Monte Carlo method example in Python

Monte Carlo Method in Python
02:18

The aim of this video is to study the Monte Carlo method in R.

   •  Perform the Mountain car method example using Monte Carlo method in R

   •  Interpret the result

Monte Carlo Method in R
03:08

The aim of this video is to study the practical reinforcement learning in OpenAI Gym.

   •  Discuss the Value Iteration in R

   •  Study the Policy Iteration in R

   •  Get to know about the Bellman Equation in R

Practical Reinforcement Learning in OpenAI Gym
01:58

The aim of this video is to study about the different MDP concepts.

   •  Study the Markov Decision Process and Dynamic Programming

   •  What are the Bellman Equations

   •  Study about the Value and Policy Functions

Markov Decision Process Concepts
07:44

The aim of this video is to study about the Python Library MDP Toolbox.

   •  Get to know in brief about the MDP Toolbox

   •  Work on the MDP Toolbox with the help of an example

Python MDP Toolbox
06:41

The aim of this video is to discuss the value and policy iteration in Python.

   •  What is the Python MDP Toolbox

   •  Work on the Python MDP Toolbox with the help of an example

Value and Policy Iteration in Python
03:32

The aim of this video is to study the MDP Toolbox in R.

   •  Get to know in brief about the MDP Toolbox in R

   •  Work on the MDP Toolbox in R with the help of an example

MDP Toolbox in R
02:49

The aim of this video is to discuss the value and policy iteration in R.

   •  Study about the Value Iteration in R

   •  Get to know about the Policy Iteration in R

   •  Learn about the Bellman Equation in R

Value Iteration and Policy Iteration in R
03:10

The aim of this video study about temporal difference learning.

   •  What is temporal Difference Learning?

   •  Get to know about the Tabular TD(0) Pseudo Code

   •  Know about the SARSA, SARSA Pseudo Code, Q Learning and Q-Learning Pseudo Code

Temporal Difference Learning
08:23

The aim of this video is learn to use the MDP Toolbox in Python to perform Q-Learning.

   •  Perform Q Learning in Python

   •  Interpret the results

Temporal Difference Learning in Python
01:53

The aim of this video is to study the Temporal Difference Learning in R

   •  Utilize the MDP Toolbox to do Q-Learning in R

   •  Perform Q Learning and One Step Temporal Difference in R

   •  Interpret and verify the results

Temporal Difference Learning in R
02:54
+ Discover Algorithms for Reward-Based Learning in R
15 lectures 02:36:01

This video provides an overview of the entire course.

Preview 05:51

How do you represent the environment when you have no explicit MDP model?

  • Determine the rules, “Physics,” structure of the state space

  • Determine the possible states, actions, new states, and rewards, and what you need to do once you have determined all of that

  • Build an environment function in R

R Example – Building Model-Free Environment
12:05

How do you determine the optimal policy to “Solve” your reinforcement learning problem?

  • Observe State-Action-New-State reward experience data

  • Use this data to determine highest-value actions for each state

R Example – Finding Model-Free Policy
11:04

In this video, we will continue with the optimal policy to “Solve” your reinforcement learning problem.

  • Map high-value state-action pairs as optimal policy function

R Example – Finding Model-Free Policy (Continued)
08:27

How does one validate the model, as well as validate (and possibly update) the previously-determined optimal policy?

  • Sample a new set of data from environment

  • Determine optimal policy function again, with the same model

  • Then compare the new policy function with the previous policy function

R Example – Validating Model-Free Policy
10:12

What are the state-value and state-action value functions?

  • Define the two value functions

  • Show how they impact policy evaluation and improvement

  • Illustrate with an R MDP example for moving a pawn

Policy Evaluation and Iteration
12:37

How do MDP problem parameters affect the optimal policy solution?

  • Introduction to the discount factor, “gamma”

  • Show how gamma affects policy moving a pawn

  • Show how other parameters affect policy moving a pawn

R Example – Moving a Pawn with Changed Parameters
08:35

How gamma affects policy improvement and optimal policy determination by diving deeper into the nature of the discount factor, gamma?

  • Explain how the discount factor determines the value function

  • Show how the value function determines policy

  • Present an R example of discount and rewards affecting policy

Discount Factor and Policy Improvement
11:21

What is the nature of the Monte Carlo Model-Free approach to solving Reinforcement Learning problems?

  • Describe the characteristics of the Monte Carlo approach

  • Describe random versus epsilon-greedy action selection

  • Illustrate with an R race-to-goal example

Monte Carlo Methods
10:42

What is the nature of the Model-Free Q-Learning approach to solve Reinforcement Learning problems?

  • Describe Q-Learning as an off-policy learning concept

  • Walk through the Q-Learning update rule

  • Illustrate Q-Learning with an R example

Environment and Q-Learning Functions with R
08:37

Diving deeper into the nature of Q-Learning.

  • Look at effects of the learning rate parameter on Q-values

  • Look at effects of randomness of actions on policy

  • Illustrate effects of learning rate, random actions using R examples

Learning Episode and State-Action Functions in R
14:27

Explore the characteristics of the SARSA algorithm.

  • Describe SARSA as an on-policy learning concept

  • Compare SARSA to the Model-Free Q-Learning approach

  • Note how SARSA is unique from Q-Learning

State-Action-Reward-State-Action (SARSA)
05:44

What is the nature of the Simulated Annealing algorithm alternative to Q-Learning?

  • Describe the characteristics of the Simulated Annealing approach

  • Describe probabilistic action selection derived from Boltzmann distribution metaheuristic

  • Illustrate with an R simulated annealing 2x2 grid example

Simulated Annealing – An Alternative to Q-Learning
10:36

How does one incorporate the discount factor into the previous Model-Free Q-Learning Reinforcement Learning algorithm?

  • Modify the Q-Learning algorithm to include a discount factor

  • Include the aggregation of rewards by episode

  • Illustrate modified Q-Learning algorithm with R examples

Q-Learning with a Discount Factor
14:35

How does one demonstrate the effects of Q-Learning algorithm control parameters using effective visualizations?

  • Use the popular R ggplot2 package to create visualizations

  • Examine effects of epsilon, alpha, and gamma control parameters

  • Create color-based line plots of Q-values and rewards

Visual Q-Learning Examples
11:08
Requirements
  • A basic understanding of Machine Learning concepts is required.
Description

Reinforcement Learning has become one of the hottest research areas in Machine Learning and Artificial Intelligence. You can make an intelligent agent in a few steps: have it semi-randomly explore different choices of movement to actions given different conditions and states, then keep track of the reward or penalty associated with each choice for a given state or action. This Course describes and compares the range of model-based and model-free learning algorithms that constitute Reinforcement Learning algorithms.

This comprehensive 3-in-1 course follows a step-by-step practical approach to getting grips with the basics of Reinforcement Learning with R and build your own intelligent systems. Initially, you’ll learn how to implement Reinforcement Learning techniques using the R programming language. You’ll also learn concepts and key algorithms in Reinforcement Learning. Moving further, you’ll dive into Temporal Difference Learning, an algorithm that combines Monte Carlo methods and dynamic programming. Finally, you’ll implement typical applications for model-based and model-free RL.

Towards the end of this course, you'll get to grips with the basics of Reinforcement Learning with R and build your own intelligent systems.

Contents and Overview

This training program includes 3 complete courses, carefully chosen to give you the most comprehensive training possible.

The first course, Reinforcement Learning Techniques with R, covers Reinforcement Learning techniques with R. This Course will give you a brief introduction to Reinforcement Learning; it will help you navigate the "Grid world" to calculate likely successful outcomes using the popular MDPToolbox package. This video will show you how the Stimulus - Action - Reward algorithm works in Reinforcement Learning. By the end of this Course, you will have a basic understanding of the concept of reinforcement learning, you will have compiled your first Reinforcement Learning program, and will have mastered programming the environment for Reinforcement Learning.

The second course, Practical Reinforcement Learning - Agents and Environments, covers concepts and Key Algorithms in Reinforcement Learning. In this course, you’ll learn how to code the core algorithms in RL and get to know the algorithms in both R and Python. This video course will help you hit the ground running, with R and Python code for Value Iteration, Policy Gradients, Q-Learning, Temporal Difference Learning, the Markov Decision Process, and Bellman Equations, which provides a framework for modelling decision making where outcomes are partly random and partly under the control of a decision maker. At the end of the video course, you’ll know the main concepts and key algorithms in RL.

The third course, Discover Algorithms for Reward-Based Learning in R, covers Model-Based and Model-Free RL Algorithms with R. The Course starts by describing the differences in model-free and model-based approaches to Reinforcement Learning. It discusses the characteristics, advantages and disadvantages, and typical examples of model-free and model-based approaches. We look at model-based approaches to Reinforcement Learning. We discuss State-value and State-action value functions, Model-based iterative policy evaluation, and improvement, MDP R examples of moving a pawn, how the discount factor, gamma, “works” and an R example illustrating how the discount factor and relative rewards affect policy. Next, we learn the model-free approach to Reinforcement Learning. This includes Monte Carlo approach, Q-Learning approach, More Q-Learning explanation and R examples of varying the learning rate and randomness of actions and SARSA approach. Finally, we round things up by taking a look at model-free Simulated Annealing and more Q-Learning algorithms. The primary aim is to learn how to create efficient, goal-oriented business policies, and how to evaluate and optimize those policies, primarily using the MDP toolbox package in R. Finally, the video shows how to build actions, rewards, and punishments with a simulated annealing approach.

Towards the end of this course, you'll get to grips with the basics of Reinforcement Learning with R and build your own intelligent systems.

About the Authors

  • Dr. Geoffrey Hubona held a full-time tenure-track, and tenured, assistant, and associate professor faculty positions at three major state universities in the Eastern United States from 1993-2010. In these positions, he taught dozens of various statistics, business information systems, and computer science courses to undergraduate, masters and Ph.D. students. Dr. Hubona earned a Ph.D. in Business Administration (Information Systems and Computer Science) from the University of South Florida (USF) in Tampa, FL (1993); an MA in Economics (1990), also from USF; an MBA in Finance (1979) from George Mason University in Fairfax, VA; and a BA in Psychology (1972) from the University of Virginia in Charlottesville, VA.

  • Lauren Washington is currently the Lead Data Scientist and Machine Learning Developer for smartQED , an AI-driven start-up. Lauren worked as a Data Scientist for Topix, Payments Risk Strategist for Google (Google Wallet/Android Pay), Statistical Analyst for Nielsen, and Big Data Intern for the National Opinion Research Center through the University of Chicago. Lauren is also passionate about teaching Machine Learning. She’s currently giving back to the data science community as a Thankful Data Science Bootcamp Mentor and a Packt Publishing technical video reviewer. She also earned a Data Science certificate from General Assembly San Francisco (2016), an MA in the Quantitative Methods in the Social Sciences (Applied Statistical Methods) from Columbia University (2012), and a BA in Economics from Spelman College (2010). Lauren is a leader in AI, in Silicon Valley, with a passion for knowledge gathering and sharing.

Who this course is for:
  • Data Scientists and AI programmers who are new to reinforcement learning and want to learn the fundamentals of building self-learning intelligent agents in a practical way.