What is Reinforcement Learning?

Lazy Programmer Team
A free video tutorial from Lazy Programmer Team
Artificial Intelligence and Machine Learning Engineer
4.6 instructor rating • 15 courses • 156,018 students

Learn more from the full course

Artificial Intelligence: Reinforcement Learning in Python

Complete guide to Reinforcement Learning, with Stock Trading and Online Advertising Applications

14:38:03 of on-demand video • Updated May 2021

  • Apply gradient-based supervised machine learning methods to reinforcement learning
  • Understand reinforcement learning on a technical level
  • Understand the relationship between reinforcement learning and psychology
  • Implement 17 different reinforcement learning algorithms
English [Auto] Everyone and welcome back to this class. Artificial Intelligence reinforcement learning in python in this lecture. We are going to answer the question what is reinforcement learning. How is it different from supervised and unsupervised learning what are its applications. We talked a little bit about these things in the introduction and in this lecture we are going to expand on those things. The first thing you'll notice is how different reinforcement learning is from supervised and unsupervised learning. If you were to graphically show how close each of these are you can see here that supervised and unsupervised learning aren't that different. Some examples of supervised learning might be spam detection. When an email arrives in your inbox your e-mail application tries to classify whether it's spam or not spam. Another example is image classification given an image we might want to determine what kind of object is in the image. For example a car truck traffic light Pedestrian Bicycle and so forth. You can imagine how that might be useful for a self-driving car. How about unsupervised learning some examples of that might be clustering genetic sequences so you can't determine the ancestry of different families or different types of animals. Another example is topic modeling given a set of documents you can determine which documents discuss the same or similar topics with the amount of data on the Internet growing every day. You can imagine that handle evenly everything would be an infeasible task unsupervised learning is very useful in this case whereas I've drawn a supervised and unsupervised learning on the left. In contrast reinforcement learning is way out to the right to give you some idea of how different these paradigms are. Some examples of reinforcement learning are playing humans strategy games such as tic tac toe go and chess. And another example is playing video games such as Starcraft Super Mario and do so already you can see how reinforcement learning does things which sound a lot like things that humans can do which can be very dynamic. Whereas supervised and unsupervised learning sound more like a very simplistic static task which are unchanging with supervised and unsupervised learning. We always imagine the same interface which we've modeled around psyche learn for a supervised learning interface. We usually have the functions fit X Y which takes in the input samples X and the targets y and predict X which takes in input samples X and tries to accurately predict Y for an unsupervised learning interface. We usually just have a fit function which only takes in some input samples X remember that there are no targets in unsupervised learning. Sometimes we have a transform function which takes in some input samples X and turns it into a different representation that we call Z. Some examples of that might be to return a mapping to some vector or a cluster identity. The main point of this is supervised and unsupervised learning are actually so similar that it makes sense to put them in the same library in the first place. And it makes sense for their AP eyes to take on this very simple and neat format the common theme with both of these is that the interface to these is training data you take in some training data either x and y or just X and you call a fit function. In the case of supervised learning you can then make predictions on future data but in both these cases your data X and your targets y are very simple x is just an end by the matrix of input data and Y is just an N length vector of targets. This is why we say all data is the same. This generic format doesn't change whether you're doing biology finance economics or any other subject. Data is just data a table of numbers we can fit most of our algorithms in one neat library called Cycle learn. While it might seem that I'm trying to make supervised and unsupervised learning seem very simplistic these methods can actually be quite useful. Using these algorithms we can do things like face detection so that you can unlock your phone and speech recognition so that you can talk to your phone but reinforcement learning is different reinforcement learning can guide an agent for how to act in the world. So the interface to a reinforcement learning agent is much more broad than just data. It's the entire environment. That environment can be the real world. Or it can be a simulated world like a video game. As an example you could create a reinforcement learning agent to vacuum your house. Then it would be interacting with the real world. You could also create a reinforcement learning agent to learn how to walk. That would also be interacting with the real world. You can be sure that the military is interested in such technologies. They want reinforcement learning agents that can replace soldiers not to only walk but fight diffuse bombs and make important decisions while they are out on a mission. So you can see now why reinforcement learning is such a big leap from basic supervised and unsupervised learning the interface isn't just tables of data but it could potentially be the entire world. Your agent is going to have sensors some cameras some microphones and accelerometer a G.P.S. and so forth. It is a continuous stream of data coming in and it's constantly reading this data to make a decision about what to do in that moment. It has to take into account both past and future. It doesn't just statically classify or label things. In other words a reinforcement learning agent is a thing that has a lifetime and in each step of its lifetime it has to make a decision about what to do a static supervised or unsupervised model is not like that. It has no concept of time. You give it an input and it produces a corresponding output. Now some of you if you are creative might think well is supervised the algorithm should still be able to solve reinforcement learning tasks. For example if X represents the state of mind then why the target should just be the correct action to take for that state. So whether I'm driving a car or playing a video game or playing chess I will always do the right thing. Here's the problem with that a game like go has eight times tens of the power one hundred possible board positions. If you can't tell right away that is an infeasible amount of input data for comparison. Imagine at our largest image benchmark has about ten to the six samples so the number of samples for go would be ninety four orders of magnitude larger than image that which can already take about one day to train. If you have state of the art hardware to give you some idea one order of magnitude larger it would take 10 days to train into two orders of magnitude larger would take 100 days to train. So now imagine ninety four orders of magnitude larger also keep in mind there may not be such a thing as a correct action to take at all times. We don't want our A.I. to play the same way every single time we want to allow for creativity and stochastic behavior. A supervised model even if it were feasible to train would only have one target per input so it would never be able to do human like things like say generate poetry.