Convolutional Neural Networks (CNN's)

A free video tutorial from Sundog Education by Frank Kane
Join over 800K students learning ML, AI, AWS, and Data Eng.
Rating: 4.6 out of 5Instructor rating
37 courses
872,471 students
Convolutional Neural Networks (CNN's)

Lecture description

We'll introduce the concepts of CNN's, and how they are inspired by the biology of your visual cortex.

Learn more from the full course

Autonomous Cars: Deep Learning and Computer Vision in Python

Learn OpenCV, Keras, object and lane detection, and traffic sign classification for self-driving cars

12:26:02 of on-demand video • Updated May 2024

Automatically detect lane markings in images
Detect cars and pedestrians using a trained classifier and with SVM
Classify traffic signs using Convolutional Neural Networks
Identify other vehicles in images using template matching
Build deep neural networks with Tensorflow and Keras
Analyze and visualize data with Numpy, Pandas, Matplotlib, and Seaborn
Process image data using OpenCV
Calibrate cameras in Python, correcting for distortion
Sharpen and blur images with convolution
Detect edges in images with Sobel, Laplace, and Canny
Transform images through translation, rotation, resizing, and perspective transform
Extract image features with HOG
Detect object corners with Harris
Classify data with machine learning techniques including regression, decision trees, Naive Bayes, and SVM
Classify data with artificial neural networks and deep learning
English [Auto]
Now that we have the basics of deep learning under our belts, we can move on to the techniques most applicable to self-driving cars. Convolutional neural networks or CNNs are very powerful tools for identifying objects within images such as stop signs, other vehicles, obstacles and pedestrians, no matter where in the image they might be. They're inspired by how your own brain identifies objects from the signals coming from your retinas. I'll take you through the basics of CNNs and some important techniques for getting the best performance out of them. We'll play around with training a CNN to classify images that represent one of ten different kinds of objects. I'll then hand it back to Ryan, who we'll talk about a specific kind of CNN that's especially useful for autonomous vehicles and give you a project to classify traffic signs using it. Let's start with how CNNs work at a conceptual level. So far we've seen the power of just using a simple, multi-layer perceptron to solve a wide variety of problems. But you can kick things up a notch. You can arrange more complicated neural networks together and do more complicated problems with them. Let's talk about convolutional neural networks or CNNs for short. They're very important in the field of image processing, which is obviously also very important in the world of self-driving cars. Usually you hear about CNN's in the context of image analysis. Their whole point is to find things in your data that might not be exactly where you expected them to be. Technically, we call this feature location invariance. That means that if you're looking for some pattern or some feature in your data, but you don't know where exactly it might be in your data, a CNN can scan your data and find those patterns for you wherever they might be. For example, in this picture here, that stop sign could be anywhere in the image. And a CNN is able to find that stop sign no matter where it might be in the image. Now, it's not just limited to image analysis. It can also be used for any sort of problem where you don't know where the features you have might be located within your data. Machine translation or natural language processing tests come to mind for that. You don't necessarily know where the noun or the verb or the phrase you care about might be in some paragraph or sentence that you're analyzing, but a CNN can find it and pick it out for you. Sentiment analysis is another application of CNNs. So you might not know exactly where a phrase might be that indicates some happy sentiment or some frustrated sentiment or whatever you might be looking for. But a CNN can scan your data and pluck it out and you'll see that the idea behind it isn't really as complicated as it sounds. This is another example of using fancy words to make things seem more complicated than they really are. So how do they work? Well, CNN's convolutional neural networks are inspired by the biology of your visual cortex. It takes cues from how your brain actually processes images from your retina. And it's another fascinating example of emergent behavior. The way your eyes work is that individual groups of neurons service a specific part of your field of vision. We call these local receptive fields. They are just groups of neurons responding to a part of what your eyes see. They subsample the image coming in from your retinas, and they have specialized groups of neurons for processing specific parts of the field of view that you see with your eyes. Now, these little areas from each local receptive field overlap each other to cover your entire visual field. And this is called convolution. Convolution is just a fancy word of saying, I'm going to break up this data into little chunks and process those chunks individually. And then the system assembles a bigger picture of what you're seeing higher up in the chain. The way it works within your brain is that you have many layers, just like a deep neural network that identifies various complexities of features, if you will. So the first layer that you go into from your convolutional neural network inside your head might just identify horizontal lines or lines at different angles or specific kinds of edges. We call these filters and they feed into a layer above them that would then assemble those lines that had identified at the lower level into shapes. And maybe there's a layer above that that would be able to recognize objects based on the patterns of shapes that you see. So we have this hierarchy that detects lines and edges and then shapes from the lines and then objects from the shapes. And if you're dealing with color images, we have to multiply everything by three because you actually have specialized cells within your retina for detecting red, green and blue light. Those are processed individually and assembled later on. So that's all a CNN is. It's just taking a source image or source data of any sort, really breaking it up into little chunks called convolutions. And then we assemble those and look for patterns that increasingly higher complexities at higher levels in your neural network. So how does your brain know that you're looking at a stop sign there? Let's talk about this in more colloquial language. Like we said, you have individual, local, receptive fields that are responsible for processing specific parts of what you see. And those local receptive fields are scanning your image and they overlap with each other, looking for edges. You might notice that your brain is very sensitive to contrast and edges that it sees in the world. Those tend to catch your attention, right? That's why the letters on this slide catch your attention, because there's high contrast between the letters and the white background behind them. So at a very low level, you're picking up the edges of that stop sign and the edges of the letters on the stop sign. Now, a higher level might take those edges and recognize the shape of that stop sign. That layer says, Oh, there's an octagon there. That means something special to me. Or those letters form the word stop. That means something special to me too. And ultimately that will get matched against whatever classification pattern your brain has of a stop sign. So no matter which receptive field picked up that stop sign at some layer, it will be recognized as a stop sign. And furthermore, because you're processing data in color, you can also use the information that a stop sign is read and further use that to aid in the classification of what this object really is. So somewhere in your head there's a neural network that says, Hey, if I see edges arranged in an octagon pattern that has a lot of red in it and says stop in the middle, that means I should probably hit the brakes on my car. And at some even higher level where your brain is doing higher reasoning, that's what happened. There's a pattern there that says, Hey, there's a stop sign coming up here. I better hit the brakes in my car. And if you've been driving long enough, you don't even really think about it anymore, do you? It feels like it's hardwired. And that literally may be the case. Just like you can build a self-driving car that does this automatically too. There's nothing all that special about it anyway. A convolutional neural network and an artificial convolutional network works the same way. It's the same exact idea.