Reinforcement Learning is a type of machine learning that allows machines and software agents to act smart and automatically detect the ideal behavior within a specific environment, in order to maximize its performance and productivity. Reinforcement Learning is becoming popular because it not only serves as an way to study how machine and software agents learn to act, it is also been used as a tool for constructing autonomous systems that improve themselves with experience. This video will give you a brief introduction to Reinforcement Learning; it will help you navigate the "Grid world" to calculate likely successful outcomes using the popular MDPToolbox package. This video will show you how the Stimulus - Action - Reward algorithm works in Reinforcement Learning. By the end of this video you will have a basic understanding of the concept of reinforcement learning, you will have compiled your first Reinforcement Learning program, and will have mastered programming the environment for Reinforcement Learning.
About the author :
Dr. Geoffrey Hubona held a full-time tenure-track, and tenured, assistant, and associate professor faculty positions at three major state universities in the Eastern United States from 1993-2010. In these positions, he taught dozens of various statistics, business information systems, and computer science courses to undergraduate, master's and Ph.D. students. Dr. Hubona earned a Ph.D. in Business Administration (Information Systems and Computer Science) from the University of South Florida (USF) in Tampa, FL (1993); an MA in Economics (1990), also from USF; an MBA in Finance (1979) from George Mason University in Fairfax, VA; and a BA in Psychology (1972) from the University of Virginia in Charlottesville, VA.
The aim of this video is to introduce Reinforcement Learning (RL) and illustrate RL concepts with a prototypical example.
The aim of this video is to demonstrate how to represent Grid World using the R software and to introduce the RL concepts of sequences of actions and randomness of actions.
The aim is twofold: one isto probe more deeply into how the possible random execution of actions can affect the outcome, andthe second is to demonstrate that the specific reward structure can affect the optimal policy with regard to the best action.
The video deals with developing the optimal policy as a model-free solution to navigating a 2 x 2 grid
This video addresses the epsilon-greedy action selection strategy to update the optimal policy with a model-free solution to navigating a 2 x 2 grid.
This video presents an end-of-Title user exercise, integrating much of the material presented in the three sections.
Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.
With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.
From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.
Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.