The History of Artificial Neural Networks

Sundog Education by Frank Kane
A free video tutorial from Sundog Education by Frank Kane
Founder, Sundog Education. Machine Learning Pro
4.5 instructor rating • 18 courses • 387,212 students

Lecture description

We'll cover the evolution of artificial neural networks from 1943 to modern-day architectures, which is a great way to understand how they work.

Learn more from the full course

Machine Learning, Data Science and Deep Learning with Python

Complete hands-on machine learning tutorial with data science, Tensorflow, artificial intelligence, and neural networks

14:11:01 of on-demand video • Updated July 2020

  • Build artificial neural networks with Tensorflow and Keras
  • Classify images, data, and sentiments using deep learning
  • Make predictions using linear regression, polynomial regression, and multivariate regression
  • Data Visualization with MatPlotLib and Seaborn
  • Implement machine learning at massive scale with Apache Spark's MLLib
  • Understand reinforcement learning - and how to build a Pac-Man bot
  • Classify data using K-Means clustering, Support Vector Machines (SVM), KNN, Decision Trees, Naive Bayes, and PCA
  • Use train/test and K-Fold cross validation to choose and tune your models
  • Build a movie recommender system using item-based and user-based collaborative filtering
  • Clean your input data to remove outliers
  • Design and evaluate A/B tests using T-Tests and P-Values
English Let's dive into artificial neural networks and how they work at a high level. Later on we'll actually get our hands dirty and actually create some, but first we need to understand how they work and where they came from. So it's pretty cool stuff, I mean, this whole field of artificial intelligence is based on an understanding of how our own brains work, so, you know, over millions of years of evolution nature has come up with a way to make us think and if we just reverse engineer the way that our brains work we can get some insights on how to make machines that think. So within your brain, specifically your cerebral cortex which is where all of your thinking happens, you have a bunch of neurons, these are individual nerve cells and they are connected to each other via axons and dendrites. You can think of these as connections, you know, wires, if you will, that connect different axons together. Now an individual neuron will fire or send a signal to all the neurons that it's connected to when enough of its input signals are activated. So the individual neuron level it's a very simple mechanism, you just have this cell, this neuron that has a bunch of input signals coming into it and if enough of those inputs signals reach a certain threshold, it will in turn fire off a set of signals to the neurons that it in turn is connected to as well. But when you start to have many, many, many of these neurons connected together in many, many different ways with different strengths between each connection, things get very complicated. So this is kind of the definition of "emergent behavior," you have a very simple concept, a very simple model, but when you stack enough of them together you can create very complex behavior at the end of the day and this can yield learning behavior. This is actually, this actually works and not only works in your brain, it works in our computers as well. Now think about the scale of your brain. You have billions of neurons each of them with thousands of connections and that's what it takes to actually create a human mind. And this is a scale that, you know, we can still only dream about in the field of Deep Learning and Artificial Intelligence, but it's the same basic concept, you just have a bunch of neurons with a bunch of connections that individually behave very simply, but once you get enough of them together wired in enough complex ways you can actually create very complex thoughts, if you will, and even consciousness. The plasticity of your brain is basically tuning where those connections go to and how strong each one is and that's where all the magic happens, if you will. Furthermore, if we look deeper into the biology of your brain, you can see that within your cortex neurons seem to be arranged into stacks or cortical columns that process information in parallel. So, for example, in your visual cortex different areas of what you see might be getting processed in parallel by different columns, or cortical columns, of neurons. Now each one of these columns is in turn made of these mini columns of around 100 neurons per mini column that are then organized into these larger hyper columns and within your cortex there are about 100 million of these mini columns. So again, they just add up quickly. Now coincidentally this is a similar architecture to how the video card, the 3D video card in your computer works, it has a bunch of very simple, very small processing units that are responsible for computing how little groups of pixels on your screen are computed at the end of the day and it just so happens that that's a very useful architecture for mimicking how your brain works. So it's sort of a happy accident that the research that's happened to make videogames behave really quickly, or play Call of Duty or whatever it is that you like to play, lent itself to the same technology that made Artificial Intelligence possible on a grand scale and at low cost, the same video cards you're using to play your video games can also be used to perform Deep Learning and create artificial neural networks. Think about how better would be if we actually made chips that were purpose built specifically for simulating artificial neural networks, well, turns out some people are designing chips like that right now, by the time you watch this they might even be a reality, I think Google's working on one as we speak. So at one point someone said "hey! The way we think neurons work is pretty simple, it actually wouldn't be too hard to actually replicate that ourselves and maybe try to build our own brain," and then this idea goes all the way back to 1943. People proposed a very simple architecture where if you have an artificial neuron, maybe you can set up an architecture where that artificial neuron fires if more than a certain number of its input connections are active, and when they thought about this more deeply in a computer science context, people realize you can actually create logical expressions, boolean expressions by doing this. So depending on the number of connections coming from each input neuron and whether each connection activates or suppresses a neuron, you can actually do both that works that way in nature as well, you can do different logical operations. So this particular diagram is implementing an OR operation, so imagine that our threshold for our neuron was that if you have two or more inputs active, you will in turn fire off a signal. In this set up here we have two connections to neuron A and two connections coming in from neuron B, if either of those neurons produce an input signal, that will actually cause neuron C to fire, so you can see we have created an OR relationship here where if either a neuron a or neuron B feeds neuron C to input signals, that will cause neuron C to fire and produce a true output, so we've implemented here the boolean operation C = A OR B just using the same wiring that happens within your own brain, and I will go into the details, but it's also possible to implement AND and NOT in similar means. Then we start to build upon this idea, we create something called the Linear Threshold Unit, or LTU for short, in 1957. This just built on things by assigning weights to those inputs, so instead of just simple ON and OFF switches, we now have the ability, the concept of having weights on each of those inputs as well that you can tune further, and again this is working more toward our understanding of the biology, different connections between different neurons may have different strengths and we can model those strengths in terms of these weights on each input coming into our artificial neuron. We're also going to have the output be given by a step function. So this is similar in spirit to how we were using it before, but instead of saying we're going to fire if a certain number of inputs are active, well, there's no concept anymore of active or not active, there's weights coming in and those weights could be positive or negative. So we'll see if that some of those weights is greater than zero, we'll go ahead and fire on ON or on OFF, if it's less than zero, we won't do anything. So just a slight adaptation to the concept of an artificial neuron here where we're introducing weights instead of just simple binary ON and OFF switches. So let's build upon that even further and we'll create something called the perceptron, and a perceptron is just a layer of multiple linear threshold units. Now we're starting to get into things that can actually learn, OK? So by reinforcing weights between these LTU's that produced the behavior we want, we can create a system that learns over time how to produce the desired output, and again, this also is working more toward our growing understanding of how the brain works. Within the field of neuroscience there's a saying that goes "cells that fire together wire together," and that's kind of speaking to the learning mechanism going on in our artificial perceptron here where we have weights that are leading to the desired result that we want, you know, we can think of those weights again as strengths of connections between neurons, we can reinforce those weights over time and reward the connections that produce the behavior that we want. OK? So you see here we have our inputs coming into weights just like we did in LTU's before, but now we have multiple LTU's gang together in a layer and each one of those inputs gets wired to each individual neuron in that layer, OK? And we then apply step function to each one, maybe this will apply to, you know, classification, maybe this would be a perceptron that tries to classify an image into one of three things or something like that. Another thing we introduce here is something called the Bias Neuron off there on the right and that says something to make the mathematics work out, sometimes we need to add in a little fixed constant value that might be something else you cannot test for as well. So this is a perceptron, we've taken our artificial network, move that to a linear threshold unit and now we've put multiple linear threshold units together in a layer to create a perceptron and already we have a system that can actually learn, you know, you can actually try to optimize these weights and you can see there's a lot of them at this point. If you have every one of those inputs going to every single LTU in your layer, they add up fast and that's where the complexity of deep learning comes from. Let's take that one step further and we'll have a multi-layer perceptron. So now instead of a single layer perceptrons of LTU's, we're going to have more than one and we actually have now a hidden layer in the middle there, so you can see that our inputs are going into a layer at the bottom, the output are layered at the top and in-between we have this hidden layer of additional LTU's, linear threshold units, that can perform what we call Deep Learning. So here we have already what we would call today a Deep Neural Network. Now there are challenges of training these things because they are more complex, but we'll talk about that later on, it can be done and again, the thing to really appreciate here is just how many connections there are, so even though we only have a handful of artificial neurons here you can see there's a lot of connections between them and there's a lot of opportunity for optimizing the weights between each connection. OK? So that's how a multi-layer perceptron works. You can just see that again we have emergent behavior here, an individual linear threshold unit is a pretty simple concept, but when you put them together in these layers and you have multiple layers all wired together you can get very complex behavior because there's a lot of different possibilities for all the weights between all those different connections. Finally we'll talk about a modern Deep Neural Network and really this is all there is to it, you know, the rest of this course we're just going to be talking about ways of implementing something like this, OK? So all we've done here is we've replaced that step function with something better, we'll talk about alternative activation functions, this one is illustrating something called ReLU, that we'll talk about later. The key point there though is that a step function has a lot of nasty mathematical properties, especially when you're trying to figure out their slopes and their derivatives, so turns out that other shapes work out better and allow you to converge more quickly when you're trying to train a neural network. We'll also apply softmax to the output, which we talked about in the previous lecture, that's just a way of converting the final outputs of our neural network or deep neural network into probabilities from whence we can just choose the classification with the highest probability. And we will also train this neural network using gradient descent or some variation thereof, there are several of them to choose from, we'll talk about that in more detail as well. Maybe that will use autodiff, which we also talked about earlier, to actually make that training more efficient. So that's pretty much it! You know, in the past five minutes or so that we've been talking I've given you the entire history pretty much of a deep neural networks and Deep Learning and those are the main concepts, it's not that complicated, right!? That's really the beauty of it. It's emergent behavior, you have very simple building blocks, but when you put these building blocks together in interesting ways, very complex and frankly mysterious things can happen. So I get pretty psyched about this stuff. Let's dive into more details on how it actually works up next.