Name: ARS vs other AI
Uploaded: 2018-06-17T21:31:44Z
Duration: 11 min 35 s
Description: Learn, build and implement the most powerful AI model at home. Compete with multi-billion dollars companies using ARS.

A free video tutorial from Hadelin de Ponteves

Passionate AI Instructor

Instructor rating

43 courses

2,636,058 students

Learn more from the full course

Artificial Intelligence (ARS): Build the Most Powerful AI

Learn, build and implement the most powerful AI model at home. Compete with multi-billion dollars companies using ARS.

04:44:11 of on-demand video • Updated June 2026

Build an AI

Understand the theory behind augmented random search algorithm

Learn how to build most powerful AI algorithm

Train and implement ARS algorithm

Train AI to solve same challenges as Google Deep Mind

English [Auto]

Hello and welcome back to the course ornamented random search in today's Litoral, we're going to be comparing Earth versus other A.I. algorithms. And the reason why we included this tutorial in the course is to highlight the main differences between errors and other standard or more conventional A.I. algorithms. It's pretty hard to say conventional because all of this is cutting edge technology and cutting edge models and algorithms. But nevertheless, we want to contrast ours, which is brand new. And how is it different to other real AI algorithms that exist out there that we've seen before? And hopefully this will give you a good overview of how they contrast and help you be more prepared for these conversations or know what the advantages of SARS are, what the differences are. So let's have a look on the left. Going to have a column for features of ours. They're going to be three main distinctions between the two. That's not the full list, but those are the ones that we're going to cover and those are the ones we find the most important ones. And on the right, we're going to have other. So no one is exploration areas, performance exploration in the policy space where the whereas other A.I. usually performs exploration in action space. So let's have a look at what this actually means in areas. We've already discussed that we've got this Perceptron and then we wait until the agent gets to the end of the episode. And then after based on the results, whether they successfully got to the end and won or they fell over, at some point we get a reward, which we calculate, which we use to adjust the weights. Well, the thing is, environments actually are set up. For instance, this Mujo go environment is set up independent of the algorithm that's going to be applied. So there's this environment and you can apply areas to it. You can apply a different A.I. or any other algorithm to it. So and the thing is that the environment is set up to provide provide rewards, not just at the end of the episode, but after every single action. Just that's just how the environment works. Every time the agent does something, the environment provides a reward. For instance, if it's getting closer to the target, then it's getting a positive reward. If it's falling over or if it's getting further than the target might be getting a negative reward. In any case, there's a reward that's provided after every single action. What SARS does, though, is it doesn't take advantage of the opportunity to look at the reward. After every single action, SARS accumulates the total reward and then looks at the total reward after the episode. That's important to understand. And remember that it's kind of like not using it could be looking at the reward after every action, but it's not. So it's actually not using the all of the possibilities that the environment provides. And yet it's still stronger than other surprising. And this is in contrast to how other I usually works. And the way they usually work is these are slides from our artificial intelligence failures that, of course, if you've been part of that course, then you'll remember them. Normally I performs actions and gets rewards. So it goes into a new state and it gets a reward right away. So FROMSON an action gets a reward performance and action gets reward and it analyzes the reward. In contrast to errors, normally a reinforcement learning and learning and things like that, they analyze the reward right away. As soon as the action is performed, they get the reward, they analyze the reward. And what this allows them to do is apply the bellman equation and build a value function for states. So then they know the values of different states, of being of being in different states. And therefore after that they can use those value functions in order to make their decisions and create the maps. Are they going to get through these environments? So that's the main difference in areas. We focus on the total reward for the whole episode. And basically we're analyzing, we're exploring. Therefore, we're not exploring every single action. We're exploring a whole policy. So the whole approach of how to go through an episode. So we're and we're assessing and the whole episode, the whole episode, and therefore it's we're exploring in the policy state the space, whereas in other normally we're exploring actions and therefore we're exploring in action space. So that's what we have exploration in the policy space from start to finish for the whole episode versus exploration and action space where we explore every individual action. And that's the difference in a way. If you have done artificial intelligence, there is of course, you remember that we had a tutorial on eligibility trace in in there for A.I. and that's exactly that's kind of similar. That's what ours does is kind of similar to. Eligibility trace where eligibility trace, we didn't look at every single action, but we matched up the action, so we did had five or 10 or 15 actions in a row. And then we only looked at the reward. And we remember from there that that actually improved the performance of other. Yet we won't go into detail on this. But just just a hint that if you are aware of eligibility trace, IRS is kind of like similar thinking in the similar direction as eligibility trace. All right. Next step two is what how is that updating of weights performed in areas is the method of finite differences. In other area, it's gradient, the famous or infamous gradient descent algorithm. So we talked about the method of finite differences over here where we basically have a small positive shift, a negative shift or positive Delta negative delta. We have the rewards and based on that, we can calculate how to adjust our weights. Overall, the difference is that in other A.I. we don't use math like we actually use gradient descent, which is proper differentiation of the loss that you're getting. So in essence, you get a reward in not normally in AIU, you get a loss. So it's a proper differentiation of the loss based on the weights and going backwards through through the network. So that is what we do. Normally, I across the score, also known as back propagation of the error through the network to update the weights. The reason why we can't do that in areas, why we have to do the method of finite differences is because we simply don't have a value function. We just don't because we're not exploring in action space. We don't have that value function that we can that we're going to apply this method to. And therefore we have to deal with what we have. And that's why the method of finite differences is used. All right. And it works totally fine. It works really great. In fact, in that main research paper for hours, they talk about that this is a good enough approximation of the gradient. All right. And number three is shallow learning in the case of errors, in the case of other, it's usually deep learning. So if we have a look, as we recall, this was the Perceptron for arrests. It might have a lot of inputs, might have a lot of outputs. But the main point is that there's just one. There's the input layer which is connected right away to the output there in A.I. that is based on deep learning, usually have a at least a hidden layer like this or this one hidden layer where these inputs are inter combined and then they're connected. The output layer, in fact, more often than not, you'll have multiple hidden layers and therefore it is actually proper, deep learning that is used because as you can see, the neural network is very deep in this case. And so that's a difference. And obviously that takes more computation, more training. There's more weights that need to be adjusted and so on. And so there we go. We've got areas, exploration, the policies, the space method of finite differences and shallow learning. Other AI is exploration and action, space gradient, descent algorithm and back propagation. And three, usually deep learning. And so what does this sum up to? Well, even though Earth in many ways is kind of like less involved than other A.I., as we just discussed, as a result, Eris is still much stronger. It's up to 15 times faster and at the same time yields higher rewards on specific applications. And I put in specific applications here because Earth has been critiqued that it is like it finds real inefficiencies, for instance, or inaccuracies in the Magico algorithm, and it like exploits them. And so therefore somebody or someone might argue that deep other A.I. that's based on deep learning, even though it's slower, might be more versatile, might be more applicable and smart in different ways. But at the end of the day, if you have a specific application and you need to get it sorted out and you need to solve the problem, why does it matter if you know what what how you get to the end result? Like, you know, as long as you are getting the result that you want, then and, you know, the problem is solved, then you have a good outcome. And so therefore, arace can be a great solution in many. Different applications, just something to keep in mind that, you know, it might not be as versatile, but that is still yet to discover, you need to discover that on the different specific application that you're using, aerosol overall is been showing some fantastic results. And on these benchmark tests, such as Magico is beating other artificial intelligence algorithms that exist out there by it's 15 times faster and still gets high rewards and finish off today's. So we've got some additional reading. This paper is called Evolution Strategies as a scalable alternative to Reinforcement Learning by Tim Salomon's and others published via Open API in 2017. The link is on the course notes and this archive. So basically this paper is talks about evolution strategies, which is another term for random search. It's exactly the same thing, just different terminology. And this paper came before the IRS paper and there they kind of compared evolution strategies or random search to other reinforcement learning algorithms are the artificial intelligence. So it's in line with the topic of today's tutorial. If you'd like to have a look. And this paper is also referenced in the iris paper that we touched on before. On that note, I hope you enjoyed today's tutorial. I look forward to seeing you back here next time. And until then, enjoy EHI.

More about this course