Computer Vision - An Introduction

A free video tutorial from Loony Corn
An ex-Google, Stanford and Flipkart team
Rating: 4.2 out of 5Instructor rating
67 courses
155,537 students
Computer Vision - An Introduction

Lecture description

A quick intro to Computer Vision, and one of the most popular starter problems - identifying handwritten digits using the MNIST database. We also talk about feature extraction from images.

Learn more from the full course

From 0 to 1: Machine Learning, NLP & Python-Cut to the Chase

A down-to-earth, shy but confident take on machine learning techniques that you can put to work today

19:50:09 of on-demand video • Updated January 2018

Identify situations that call for the use of Machine Learning
Understand which type of Machine learning problem you are solving and choose the appropriate solution
Use Machine Learning and Natural Language processing to solve problems like text classification, text summarization in Python
English [Auto]
Computer vision is a really interesting problem and it's an important application of machine learning. Computer Vision is the science of teaching computers how to see when you see we actually mean that we want computers to be able to identify and extract information from images and videos we basically want them to look at images and with us and be able to understand them what those images are and you also saying be able to describe them and then they should be able to make decisions based on this information this definition could be awesome Belleza granting access based on recognizing somebody's face or it could be something like or board. Being able to make decisions based on what we can see. Let's look at some common tasks in computer vision handwriting recognition mission is a pretty common but give an image of a handwritten text or if somebody is writing something down on it like we should be able to convert it into text. As you can imagine this could be a particular difficult problem. Imagine beside fitting a doctor's prescription. For example object recognition is an important task being able to look at the picture or really be or are looking out at Wardo of the computer can see it or not. And recognizing what images are what objects are that in that image these pictures are a few examples of how a computer was able to identify objects in an image. For example in the top left image the computer is able to identify that that is overworld got a tree a railing and sky. Similarly on the boredom right it's able to go out and define that some of the objects up are sense and how are windows object recognition is a very important task in computer vision being able to identify the different objects and then put them together. Don't describe the bigger picture is something that computers need to be able to do to move in the direction of artificial intelligence. Another interesting problem is faced addiction. This is slightly different and more difficult than objective mission itself identifying a face in a picture and then identifying who that person is as well. Most commonly asked whether they really use a large or even smartphone cameras have this feature in all of this that they can identify a face. And Godfrey smiling or not and snap a picture them since. Another example of an application or face detection is Facebook's recent deep faith and them which is able to very accurately identify who a person in a picture does. The reason Facebook is able to do that is because of all the damage your cause has a huge amount of data in the form of images posted by their users and those images even have the tags of who are in the pictures as well can be derision as you can see opens up so many different possibilities. If computers could see them understand the world around them they could do amazing things for it for example. Self-driving cars wouldn't be possible if you didn't have computer vision the race driving cars are envisioned and the way they work to the if by using a lot of different sensors and Commodores around the card which we Feagin data to a computer and then the computer tries to understand and process what are the different objects which are on that. If something comes on for that or would they have to slow down or swerved to avoid it. All of this is processed using computer division techniques so as you can see that in a lot of interesting problems that come under the umbrella of computer has been some are very difficult to solve that are known known solutions. Some have been solved. One of starter problems that most people start with income intelligence is handwritten digit recognition. Now the areas that you have a bunch of handwritten digits. The digits can be between zero to nine. And you need to identify which digit as it gets there as you can see handwriting varies from person to person. Each person would write read or in a different way one in a different way don't have different grams or one harm you didn't get your mission becomes important when you look at applications like bank check processing if you want it to automatically process a check at the bank. Your program should be able to look a scan image of a check and read the digits on it. Very rarely I can actually imagine the ruckus if the program does eat or has a mine or why C was the. So this is a problem that people have been trying to solve for many many decades. In fact there are many many different proposed solutions with varying degrees of accuracy. But at the heart of all of these different proposed solutions are basically different approaches to solving a classification problem. So you take this problem and you look at it as a classification problem. Given any measure for home you can district you need to classify it as one of the digits in 0 9. Their standard image processing techniques that can take an image of a number and then break it up into the individual digits then given each of those digits you need to be able to classify it as one of the difference between the work of my. So whenever people come up with the new liberalism order not approach to solve this particular problem that best simple firemans of that algorithm on a standard dubius enlist is one such standard data is off hand and of just this database was created and is still maintained by one of the foremost researchers on neural networks on computer. And this database has about 60000 different images of digits and the digits have been here to right 500 different writers each of these images comes with the label of minutes which digit it does between the left and right. Another set of 10000 images is provided so that people can Estep or foments that they've come up with. Use the 60000 images of cranial classifier. And then Estep on another set of 10000 images. This is a Steinert process that was before know when they're coming up with a new approach for classifying cambered indigenous. So let's get to WARGA The basic steps in water and any classification problem first you will present each of the data points in your training set each image in the genome set as a feature vector which is a couple of numbers. So each image needs to be bigger and can Werber sort of love numbers that you represent that image then you would feed all these feature vectors along with the labels which digit is 0 or 1 or 2 or 9 and you would think this state that and feeder tube classification are good at that classification algorithm will go on from the painting be done. What is the relationship between the features and the label. Then when you give it a new image it will be able to look at the features of that particular image and say let's get this between the two are two line. So let's go through this step by step. First we'll look at how an image and be represented as a table of numbers and then see which classification and move them back to a pool of numbers should be fed into so the first problem is this how do you make an image of a cat or dog or a Buick. And then you're presented as a pool of numbers. The basic idea is that every image is made up of accents. Most people wouldn't know what the specs of the camera that they're using whether it's just font for a camera or in the SLR camera you would say something like the camera has the ability to take pictures of it megapixels or something like that. This means that each picture that backgammon and X is made up of eight million pixels big. Any image and if you zoom in far enough you will be able to see that it's actually made of a large number of binary squares. And when you zoom out at some point you will be unable to distinguish between the edges and you will see a small image. Here is one example of a handwritten digit zero and the pixels that make up that picture one tiny square that makes up this picture is kind of action each pixel in the image can be represented by a couple of numbers. The first number is that X scored and made up. This is how far we see from the left edge the second number is the ordered me order of the box and see how funny this from their. Image. Then you'll have a lot of effects that if your picture is just black and white that this each pixel can dig either a black or white color then this color could be represented by a black eye that it does zero or. So it's a binary number. These three numbers together could represent up pixel on of all the pixels again that could represent an image. Usually the color you want be binary though. In fact it would be a number between 0 and 1. And in that case you would see something called a grayscale image that number between the two and one would represent how gray that particular pixel is zero would be white and one could be black or the other it what happens when you have a full color image each pixel then actually has three numbers to represent the color that each number represents that in density of one primary color. This could be made on green or blue. So each color then becomes that the beloved thing on itself. Sometimes you might even have another in the deformed fall which represents the transparency of that particular Fix-It. So this three or four numbers to represent a color in full color and match the number of complex required to represent the actual color of the person in this case it's three in greyscale X Y and this number of color there is also and when I said number of channels required to represent a pixel Ulrike we concentrate for a dime on the grayscale images basically with just one number as required to represent the color of the picture. So if you are given a green Eskin image you can just represent it as I mentioned and the each number in the end it represents one pixel its position in the image and the value of the number represents the greatness or the color of that particular pixel the size of it would be the same as the size of that image. So let's say you have an image which is 100 Vixen's wide and a hundred and twenty eight pixels hot. Then that image could be represented by Echo dimensionality which has a hundred and twenty eight or words and 100 columns each number and that represents a pixel. The great value of tactics in now this door dimension and it could actually be used by the CLI as a feature vector for a classification algorithm because you are already presented that image using numbers how sure if your image is full color and it has millions of big accents. Then this feature could end up having too many dimensions. It could be that some of these dimensions are just agreed on and there's not too much information that you'll get by having them. For example if you're good blood and they match up quite a large extent and still be able to make out what that image is you represent the you could present an image using a much smaller number of pixels and you would be looking at a bloody match but still be able to make out what that images the idea is to stupid from them and Chanology in action to identify just stood right go see just that will help us identify which image of this without having to process the image represented by all the expense and the image Feagin extraction techniques can help us find a small number or features that when we got present they match there are many many different feature extraction techniques for images. Some of them will help you find the edges and just representing the menus in the edges of that image. Some of them will compress and and mesh into a blurry but still there goodnite is a better version of that image. In fact in recent times people have used unsupervised feature learning techniques as well that they allow a learning algorithm to identify the features by itself that really represent an image. The idea with unsupervised free show learning is that if computers are programs can identify the relevant features send an image by themselves. This could be just for human faces. This could be like the nose the eyes and ears and so on. If they can I didn't divide these features by themselves then they are closer to being independent or closer to artificial intelligence. Let's quickly look at one where the simple matter that we could use to reduce the number of Minchin's required to represent an image we could right to imagine what a number of zones of equal size. So let's say we have a six by six image. There are probably six pixels in this image we divided into four resorts each of three by three. Each has nine cents in it. Then what they need is on that count the number of black boxes here black is represented by one and white. So the top left zone has to be black since the next one has three the bottom left has four and the bottom right has four now instead of representing this particular image by six by six stacks could be six numbers you represent the females using these four numbers three to be four. So this to pull for numbers could be or should represent a mesh instead of using the end that purpose X number to put the values in ometer damages from the extra 64. This is one very simple but crude way of extracting features from images. And these are new and it is actually used pretty commonly. OK. Going back to our digit recognition problem. Walk us through the first step which is how do I identify the future. RICHTER How do you present each image as a couple of numbers. Let's not walk through. What are the choices for different classification algorithms that we can use either one or decline and then the next looks we do that in the next class.