The idea behind Recurrent Neural Networks

A free video tutorial from Kirill Eremenko
DS & AI Instructor
Rating: 4.5 out of 5Instructor rating
53 courses
2,729,567 students
The idea behind Recurrent Neural Networks

Learn more from the full course

Deep Learning A-Z 2024: Neural Networks, AI & ChatGPT Prize

Learn to create Deep Learning models in Python from two Machine Learning, Data Science experts. Code templates included.

22:26:57 of on-demand video • Updated February 2024

Understand the intuition behind Artificial Neural Networks
Apply Artificial Neural Networks in practice
Understand the intuition behind Convolutional Neural Networks
Apply Convolutional Neural Networks in practice
Understand the intuition behind Recurrent Neural Networks
Apply Recurrent Neural Networks in practice
Understand the intuition behind Self-Organizing Maps
Apply Self-Organizing Maps in practice
Understand the intuition behind Boltzmann Machines
Apply Boltzmann Machines in practice
Understand the intuition behind AutoEncoders
Apply AutoEncoders in practice
Instructor: Hello, and welcome back to the course on deep learning. Today, we are kicking off this section for recurrent neural networks and I'm very excited about this section. There's going to be quite some interesting tutorials. This is one of the most advanced algorithms that exists in the world of supervised deep learning, so let's get started. We have our little break down of supervised versus unsupervised deep learning branches, and here we've got artificial neural networks which we've already talked about, we've already talked about convolutional neural networks as well, and today we're talking about recurrent neural networks. So, this is just so that we see where we are in the big picture of things slowly getting to the unsupervised part of the course. But nevertheless, let's focus on RNNs today. Alright, so now that we know where we are on the map in terms of a list, let's have a look where we are on the map in terms of the human brain. And so, why are we doing this? Why are we looking at the human brain as if it's a map of the world? Well, the reason for that is the whole concept behind deep learning is to try and mimic the human brain and get kind of similar functions as the human brain has and leverage the things that evolution has already developed for us. And I thought it would be pretty cool if we could somehow link the different branches of deep learning that we've discussed, or the algorithms that we discussed. We talked about ANN, CNN, and now we're talking about RNN. If we could link those to some parts of the human brain and if it all makes sense. So, let's have a look. Here we've got the brain, the human brain, it's got three parts. So, we've got the cerebrum which is all of this colored part. And then we've got the cerebellum which is underneath there and that's the little brain. I actually looked it up in Latin, that does mean little brain, how funny is that? And we've already looked at a dissection of the cerebellum in the part where we're talking about ANNs, that big orange picture where we saw all of those little neurons everywhere trying to kind of gauge how many there are there. There's millions of neurons in the brain. And then we've got the brain stem over here which connects the brain to the organs and our arms and legs and so on. And so those are the main three parts of the brain. Now, the cerebrum has four lobes and they're colored in here. So, it's got the frontal lobe, it's got the temporal lobe, it's got the parietal lobe, and it's got the occipital lobe. Now, how do we link these, right? So, we've got ANN, we've already discussed CNN, and RNN. And the hardest one was probably to start off with ANN because what is the main advantage of ANN? Well, the main advantage, the main breakthrough in ANNs is, apart from the back propagation algorithm which kind of applies to all of them and in fact whatever applies to an ANN applies to everything here. But for me, I think the main thing about ANNs and it wouldn't even exist without this whole concept of deep learning wouldn't exist, are the weights. The fact that ANNs can learn through prior experience, or through prior impulse, and through prior observations that's extremely valuable. And so, what do those weights represent? And moreover, the weights of course are present across all neurons in the brain, but we're going to try to take away the main philosophical underlying notion there and that is that weights represent long term memory. That once you've run your ANN and you've trained it, you can switch it off, you can come back later. It's trained up, you know the weights, and so whatever input you put into it it will process it the same way as it would yesterday, as it will tomorrow, as it will the day after. So, the weights are long term memory of a neural network. And that's why weights, or the ANN, go into temporal lobe. Again, the weights exist across the whole brain but philosophically, ANNs are a start to deep learning and they represent long term memory. So, we've to put them in the temporal lobe because the temporal lobe is responsible for long term memory. Hence, it's called a temporal lobe meaning things last through time in there. The brain is very complex and of course other parts are also responsible for memory as well but we're going to simplify things and say ANN is like the temporal lobe. Then, CNN is much easier, it's vision, recognition of images and objects and so on, so that's the occipital lobe. And today, we're talking about RNNs and as you'll find out RNNs are like short term memory. They can remember things that just happened in the previous couple of observations and apply that knowledge going forward. I'm giving away so much already, you pretty much know the rest of this tutorial, but nevertheless. So, that's the frontal lobe. That's where have a lot of the short term memory and of course the frontal, like a quick break down. The frontal lobe also is responsible for personality, behavior, motor cortex, working memory, and lots of other things. But with our purposes the main thing that we're looking out for is the short memory. By the way, if you're interested, temporal lobe is concerned with recognition memory that's our long term memory. Parietal lobe, and these are from Wikipedia, the parietal lobe is responsible for sensation and perception, and constructing a spacial coordination system to represent the world around us. And we are yet to create a neural network which would fit into that category. And occipital is vision. Alright, so there we got a bit of neuroscience so let's move on to our favorite neural network. So here, we've got a simple artificial neural network. Three inputs, two outputs, one hidden layer. What does an RNN do? How do we represent or turn this into an RNN? Well, we squash it. We squash it all together. So, they're still there but think of it as if we're looking from underneath this neural network so we're looking in an new dimension. So, it's still there it's just flattened out. We're adding a new dimension to all of this, but remember that those neurons, the whole network, is still there. Nothing changed, we just squashed it for our purposes. Then to simplify things we're just going to change these multipliers into two, then we're gonna twist thing whole thing, make it vertical because that's the standard representation. And then in terms of neural metrics we're gonna color them, instead of green we're gonna color the hidden layers in blue, and there you go. And we're gonna add this line, and what does that line do? Well, that line is the temporal loop. And this is an old school representation of RNNs and basically means that this hidden layer not only gives an output but also feeds back into itself. So again, this is an old school representation so the common kind of approach is now to unwind, or unroll, this temporal loop and represent ANNs in the following manner, like that. So, you can see that, again don't forget that we've lots of these things happening so you're looking in a new dimension that the layers are actually still there. They're still there, but we're just not focusing on them. We just remember that each one of these circles is not one neuron, it's a whole layer of neurons. And so, what is happening is you've inputs coming into the neurons, then you got outputs, but also the neurons are connecting to themselves through time. So, that's the whole concept that they some sort of memory, short term memory, that they remember what was in that neuron just previously. And then before that it just remembers what it was previously, and that allows them to pass information on to themselves in the future and analyze things. Kind of like when you're watching this course, right? It would be very sad if you could not remember what was happening in the previous tutorial, right? Even if we break time down discreetly through, not by seconds but continuously by seconds by discreetly through tutorials and we say like every moment in time is a new tutorial, it would be very sad if you did not remember what we had in the previous tutorial, or in the previous section, or in the previous part of the course. Because then, this whole neural networks part wouldn't make any sense. You wouldn't have memory of what a neuron is, what an activation function is, but because you do remember those things you can understand this. And same thing here, so why would we deprive an artificial construct which is supposed to be a synthetic human brain, or mimicking the human brain, why would we deprive it of something so powerful as short term memory. Long term memory's great, but short term memory is so powerful why would we not give it that opportunity? And that's where recurrent neural networks come in, that's the gap that they fill in. And so, let's have a look at a couple of examples. A huge shout to the Karpathy blog, karpathy.github.io, some of these examples are from here. So, one to many relationships, this is when you have one input and have multiple outputs. An example of this is an image where a computer describes the image. So, you have one input, the image, and that would go through a CNN and then it would be fed into an RNN, and then the computer would come up with words to describe the image. And this is an actual computer describing the image, how accurate is that? Black and white dog jumps over bar. This is a computer that looked at this image and it was like, oh, it's a black and white dog. Based on what it's previously learned, the long term memory, it allowed it to come up with weights, and come up with certain feature recognition system, and come up with the filters, come up with everything that is required in a CNN. And then the RNN allows it to make sense out of the sentence. So, you can see that the sentence actually flows. There's an and, there's an over the bar, and then there's like a verb, there's a noun, and so on. So, basically the RNN is what allows it to put a sentence together in this case. Then a many to one, an example would be sentiment analysis. So, when you have a lot of text and from that text you kind of need to gauge is this a positive comment or is it a negative comment? What's the chance that it's a positive, or how positive or how negative is that comment? And you've got an example here as well. Again, there's lots of other different examples, these are just some. Then we're many to many, for instance, oh yeah I wanted to show you this one. So here, we've got an example of Google Translator. And I don't know if Google Translator uses RNNs or not, I know that they have very sophisticated deep learning Google mind and I've heard that they've used that in their Android systems and so on. I'm not sure if they use RNNs here or not, but the concept remains. So, if I say here from English to Czech. I say, "I am a boy who likes to learn." (speaking foreign language) And basically, in other languages, in some other languages it is important what gender your person is, right? So, here boy is male so that's why it's got (speaking in foreign language). And if you see, if I change this to girl in English, the other words don't change. But in Czech, the other words change. (speaking in foreign language) So, you can see right away, now it's not (speaking in foreign language) meaning that these words they depend on the gender of this word, holka. And holka is a girl and therefore this becomes (speaking in foreign language). And again, I don't know if Google Translate uses an RNN, I'm not going to comment on that, but basically the concept is the same that you need short term information about the previous word in order to translate the next word, right? You can't just translate word by word. And it's just a simple example, of course, to make a sentence make sense you do need information about the previous words. But a very visual example we have here is that at least you need to know the gender of this word in order to translate the following words for the sentence to make sense. And that's where RNNs have power because they have short term memory and they can do these things. And so, that's a many to many. You put in lots of words and then you get lots of words out, that's translation. And of course, not every example has to be related to texts or images, they can be lots and lots of different applications of RNNs. For instance, many to many you can use RNNs to subtitle movies. Meaning that you can have running subtitles, or describe every single frame in a movie and that is something you can't simply do with a CNN or other neural networks because if you're watching a movie you need context about what happened previously to describe what's happening now. And so, you need that short term memory. You can't just dry run through a million movies and kind of understand the whole plot that is going to happen. You need short term memory of the plot to be able to comment on every single frame. And speaking of movies, today we don't have additional reading, today we have additional watching. So, a movie called Sunspring in 2016 directed by Oscar Sharp. And it's got, you might know this actor, Thomas Middleditch from TV show Silicon Valley. And this movie was entirely written by Benjamin who is an RNN, an LSTM to be specific. So, I'm gonna show you this movie now. Well, I'm not gonna play it. I'm just gonna show you where to find it. So, you need to go to Ars Technica. It's only nine minutes long, I highly recommend it. I've seen it twice. It's so fun. So, a couple of things there's a great description here as well so it's worth reading through. There's actually an interview of Benjamin and he actually gave himself the name of Benjamin, that's why they call him Benjamin. It's really cool to see these things and what you'll find about the movie is the acting is amazing. Just amazing, like I had shivers down my spine towards the end. It got in the top ten in the SCI-FI-LONDON Festival. And then what you'll find is that Benjamin is able to construct sentences which kind of make sense for the most part, which is good, but what he lacks is kind of the bigger picture. He cannot come up with a plot that consistently makes sense. Even though the actors do a great job about putting it together, and it does look amazing in the end, but you will notice and kind of look out for this. When you're watching, separate the sentences and you'll see that each sentence on its own more or less, 90% of the time, makes sense. But overall, he can't properly link sentences together. And that's the next step for RNNs, this is still quite a new technology, or it's only picking up recently, so it'll be developed very soon. And maybe when you're watching this tutorial you're laughing in the future, five years from now, you're laughing to yourself and saying, oh yeah, we've already passed that milestone. And probably we will very soon, but this is where things are now and highly recommend checking this out, only nine minutes long. So there you go, that's what RNNs are in a nutshell and in the next tutorial we will continue digging deeper. I look forward to seeing you next time. Until then, enjoy deep learning.