How do Self-Organizing Maps Learn? (Part 1)

Kirill Eremenko
A free video tutorial from Kirill Eremenko
Data Scientist
4.5 instructor rating • 46 courses • 1,890,600 students

Learn more from the full course

Deep Learning A-Z™: Hands-On Artificial Neural Networks

Learn to create Deep Learning Algorithms in Python from two Machine Learning & Data Science experts. Templates included.

22:15:27 of on-demand video • Updated September 2021

  • Understand the intuition behind Artificial Neural Networks
  • Apply Artificial Neural Networks in practice
  • Understand the intuition behind Convolutional Neural Networks
  • Apply Convolutional Neural Networks in practice
  • Understand the intuition behind Recurrent Neural Networks
  • Apply Recurrent Neural Networks in practice
  • Understand the intuition behind Self-Organizing Maps
  • Apply Self-Organizing Maps in practice
  • Understand the intuition behind Boltzmann Machines
  • Apply Boltzmann Machines in practice
  • Understand the intuition behind AutoEncoders
  • Apply AutoEncoders in practice
English Instructor: Hello and welcome back to the course on deep learning. In the previous tutorials, we saw how self-organizing maps work, and today we'll finally find out how they learn. So let's get straight into it. Here we've got a very simple example of a self-organizing map. We've got three features in our input vectors, and we've got nine nodes in the output. And as we discussed previously, self-organizing maps are used to reduce the dimensionality of your data set. And here you might be wondering, how is that the case when our input only has three features, and our output seems to have more. Well don't let this representation confuse your understanding of self-organizing maps. Here we have three features or three columns in our data set, so therefore, we might have thousands and thousands and thousands of rows, each of which has three columns. And that means that our input data set is actually three dimensional, whereas our output data set in a self-organizing map is always a two-dimensional map, and therefore we are reducing the dimensionality from 3D to 2D. So now we're going to turn this self-organizing map into an input that, or into a representation that is familiar to us from what we've studied about artificial neural networks, convolutional neural networks, and recurrent neural networks previously in this course. So let's turn it around. This is what it would look like. And the key thing here is that it's exactly the same network, the only difference is how we've positioned the nodes. We still have the same amount of connections, same amount of inputs, same amount of outputs, it's just the visual representation has changed simply because we're used to this and it's easier for us to understand what's going on like this a bit better. At the same time, what I also wanted to mention is that self-organizing maps are different. They're very different to what we discussed in neural networks previously in the supervised learning part of the course. And there is two parts to this. First of all, self-organizing maps are much, much easier. So you'll see that you'll be able to grasp self-organizing maps very quickly, and the whole concept behind them is very simple and straightforward. At the same time, it's also important to note that because self-organizing maps are different, the concepts that might have the same names have different meanings, and therefore your knowledge of artificial neural networks and convolutional neural networks and recurrent neural networks from what we discussed previously might lead you into confusing meanings of what we're going to be discussing in self-organizing maps. So therefore, just have that in mind when we're going through this tutorial, and just be careful when we're talking about things like weights and synapses and other things that you might encounter. And I will try to point those out, and long as you're aware of this, we should be fine. So if we agree on that, let's get started. First thing that we're going to look at is the top node, the top node in our outputs. And we're specifically going to look at the three connections, there are three synapses leading to this node. In fact, let's gray out the rest of the synapses, so that we know that we're focusing on this specific combination, or these specific three. And each one of them, just as previously, will have a weight assigned to it. So here we've got W one one, one two, and one three. And the first index means that it's the first node in our output nodes, and the second index means where that synapse is connecting from. And the important thing for us to mention here is that weights in self-organizing maps are different, have a whole different connotation to them as opposed to what we saw in artificial neural networks. In artificial neural networks, weights were used to multiply, so we multiply the input of this node, or whatever we have in this node, by the weight, we added them up, and then we applied an activation function. Well, in self-organizing maps, there is no activation function. Weights are a characteristic of the node itself. And that's what we're representing over here, that this node actually has these coordinates. So think of it as in, you've got an input vector here of three dimensions, so X one, X two, and X three. and X one, X two, and X three are its coordinates in the input space. So, just if we think of it as a three-dimensional chart, this is a vector somewhere there, and these are its coordinates. Well this node, instead of just being a result of an activation, or as a result of these values, weighted values summed up, weights have a completely different meaning. This node is actually also trying to be a, like a ghost, a type of a ghost in our input space. It's trying to see where it can fit in our input space, and that's exactly what's going on. So these weights are the coordinates of this node in our input space. So here on one hand, for the input data set you have three nodes which represent each point. Or you could have 20 in the case if you had a twenty-dimensional input space, 20 columns in your inputs. Here you have one node representing a point in your input space, and again if you had 20 columns in your inputs, if you had 20 columns here, each node would have 20 weights. So that's important to understand. So basically just think of these output nodes, these ones, these red ones, each one of them is a ghost, or a imaginary data point in our input space. Doesn't actually exist there, it's trying to blend in. So there we go, that's node number one. Same thing we can do for node number two. Same thing for node number three. Same thing for node number four, and so on. So, each one of the nodes, in our case nine, or there could be many more, has its own weights at the start of the algorithm, as usually weights are assigned at random to values close to zero but not zero. And therefore each one of these nodes has its own imaginary place in the input space. And so why is this important? Where is this leading us to? Well, this is the core of the self-organizing map algorithm. Now we're going to have a competition. Among these nodes, we're going to go through each of our rows of our data set, and we're going to find out which of these nodes is closest to each of our rows in our data set. And we'll start with row number one. So let's go ahead and imagine that we've inputted row number one of our data set into our input nodes. So we've put in column one, column two, column three, the values of row number one. And now we're going to go through every single one of these nodes, and find out which of these is the closest in that original input space, which of these nodes is closest to our row number one. And the way we calculate it is, basically so let's calculate for node number one. We calculate the distance as a Euclidean distance. So it's calculated as X one minus W one one squared, plus X two minus W one two squared, plus X three minus W one three squared, and the square root out of all of that. And let's say we get a value of 1.2. And by the way you should get a values close to one here, because you should make sure that your inputs are between zero and one for all of this algorithm to work properly. So as we discussed previously, normalization or standardization, you've got to apply those things before you actually input the data into the self-organizing map. So that's the distance between node number one and row number one of our data set. Now we're not changing the row, we're still on row number one, but let's calculate the distance to node number two in our input space. The distance is calculated, let's say for example, 0.8. Then we'll calculate the distance to node number three, and this time the distance is 0.4. So you can see that row number one, or this input, this point in our data that this row is representing is three times closer to node number three that it is to node number one in our original three-dimensional space. And then we calculate the same thing for node number four, we get a value of 1.1 for example, and so on. So we calculate all of the distances between row number one, by the way we're still on row number one, we've calculated the distance between row number one, or the point that row number one represents in our input space, to each one these nodes in our self-organizing map. And we found that the closest one out of all of them them is node number three. And we're going to call node number three BMU, or the best matching unit. So that is the core of the algorithm, and now we want to find out what happens next. What happens with all of this, with this result, next what goes on in the self-organizing maps. So for that let's look at a larger self-organizing map. I know this is a bit counterintuitive, usually we make things smaller when we want to understand them better, but in this case we will need a larger map to understand this concept better. And let's say in this larger map we found the best matching unit for row number one. There it is. So what's going to happen next is the self-organizing map is actually going to update the weights, and I'm doing air quotations here for the word weights because they're still called weights, they're just different to the weights that we're used to. as you can see just now, weights are not actually used in the same way, here weights are characteristic of that specific node. So the weights are going to be updated for this best matching unit, so that it is actually even closer to our first row in our data set. And the reason we are updating the weights is because we simply don't have control of our inputs, we cannot our data set, so the only thing that we can control in that formula are the weights of this node in order for it to become closer. And what that will, so there we go that flash means it was updated. And in simple terms what that means, or in visual terms what that means, is the self-organizing map is coming closer to that data point, so it's this part over here that, this is our self-organizing map with its starting weights, and now this point which is actually, as you can see in this image which is from Wikipedia, you can see that it's actually the closest to our current point that we're looking at to row number one, and now we're going to drag it closer, we're going to drag it closer to this point. In the end is a result that we want like this, but let's not get ahead of ourselves for now. At this stage we're just happy to drag that one best matching unit, or BMU, to the current row. So we're dragging it a bit closer. So that's exactly what's going on, and that's why it's called a self-organizing map. It self-organizes onto your input data. And, by the way, as you can see here, what's happening is not just this one point is being dragged closer, but also some of the close, some of the nearby points are being dragged closer to this point. And that's exactly what we're going to look at next. So here's our best matching unit in the self-organizing map. The next step that we have is a whole radius around this best matching unit, and every single point, every single node of our self-organizing map that falls inside that radius is going to have its weight updated to come closer to that row that we matched up with. So there you go, they all got their weights updated. And the way it works is the closer you are to the BMU, the heavier are your weights updated. So these weights are going to be updated the most, these weights are going to be updated less, these weights are going to be updated even less. And to think of it as, the best way to think of it is as if they are dragging each other. So as you pull on this one, the whole, this whole chain or this whole structure is slowly pulled towards the same direction. So the closer you are to this BMU, the harder you will get pulled towards that row that you matched up with, or the BMU matched up with. So that's how the radius concept works. Now let's have a look at row number two. Let's say row number two had its best matching unit somewhere else, for instance over there. That's the best matching unit for row number two. Well what happens here is again that row, that BMU is updated to be closer, and it has its own radius, so everything with the radius is also updated to be closer to that row that we matched up with. And so the question here is how do they combat each other, how do they fight with each other? Well it's pretty simple. So let's have a look at one point. Let's gray all of them out except for this one red one. And as you can see it's quite far away from the green BMU, it's quite close to the blue BMU, in fact it might be so far away from the green BMU that it doesn't even fall within its radius. So what happens here is that it is pulled much harder with the blue BMU, and therefore it becomes like the blue BMU. So it becomes closer, and we're going to color it in blue. Then let's have a look at this one, same thing here, oh not same thing here, this is a bit different. So this one is still far away from the green one, but it's also quite far away from the blue one. In fact, it's just a bit closer to the blue one than the green one, so when we pull on it it will be updated, so in this case, we'll color it in kind of like a greenish-blue. And then this one, this one is actually closer to the green BMU than to the blue BMU, and therefore when we pull on the green and then we pull on the blue, they'll be a bit of a struggle but overall it will move closer to the green one than to the blue one. But both of them will have an impact. And then finally, here is another one. So this node is even closer to the green, and it's quite far away the blue BMU, and therefore when you pull on the green and the blue, of course they're both gonna have an impact, but the green is going to have a much stronger impact, and therefore we are going to color it in green. So there we go, that's us just looking at four random nodes in our self-organizing map, and hopefully that demonstrates how this map self-organizes itself onto your data points in the input, and that's a good start for us for today. In the next tutorial, we will continue exploring what happens when you have even more BMUs and how all the self-organizing map updates. I look forward to seeing you then, and until next time, enjoy deep learning.