Udemy

Viola-Jones Algorithm

A free video tutorial from Hadelin de Ponteves
Passionate AI Instructor
Rating: 4.5 out of 5Instructor rating
43 courses
2,181,254 students
Viola-Jones Algorithm

Learn more from the full course

Deep Learning and Computer Vision A-Z + AI & ChatGPT Prizes

Become a Wizard of all the latest Computer Vision tools that exist out there. Detect anything and create powerful apps.

10:57:38 of on-demand video • Updated July 2024

Have a toolbox of the most powerful Computer Vision models
Understand the theory behind Computer Vision
Master OpenCV
Master Object Detection
Master Facial Recognition
Create powerful Computer Vision applications
English [Auto]
Hello and welcome to the course on Computer Vision. Today we're talking about the Viola-jones algorithm. This is the algorithm that lies at the foundation of the OpenCV Library. And this is one of the most powerful to date algorithms for computer vision. Let's have a look. So this algorithm was developed by two people, Paul Viola and Michael Jones, and it was developed in 2001. How crazy is that? It's been like over 16 years since then, and it's still one of the most powerful algorithms in the world. It's slowly being surpassed by deep learning, but nevertheless, it is in its simplicity. It's so powerful that it is still being used for computer vision. And this is not just detecting faces on images. This is actually for real time computer vision, detecting faces on videos as well. That's how powerful it is. The Viola Jones algorithm consists of two stages. The first stage is the training. The second stage is the detection. So training of the algorithm and then detection of the actual faces in application. And we're going to start off by talking about detection. This might be a bit counterintuitive because you'd think you want to start with training, but it just makes it will be much clearer if we start with the detection and we understand how it works. Once it's already trained up, once everything is in place, we will see how it works in action. And that way when we talking about training, everything will make much more sense. Why exactly. Certain things are structured in certain ways. So here we go. Let's get started. We've got a photo here and we're just going to call this person Jade to keep them anonymous, but they're totally comfortable with us using this photo. And yeah, it's actually one of Adlon's friends. And yeah, so Jade is, is a great example of a photo here because you can see the face, it's a frontal face. And that's exactly what the Viola Jones algorithm looks for. It's it's designed to look for frontal faces. Not on this, not somebody looking to the side or up or down. It's like most of the time it's performs the best with frontal faces. That's what it's designed for. And the first thing we first step that we do in the Jones algorithm is that it is turned into grayscale. So for simplicity's sake, it's just easier to work with grayscale images. Results are still astonishing and there's just less data to process. That's why I turned to grayscale. Then, once the face is detected, the Viola Jones algorithm finds that actual location of face on the color image, so you won't even notice that it's working with grayscale. But in reality, in the background, Viola Jones algorithm is working with a grayscale version of your image. And so what the Viola Jones algorithm does is it starts looking for the face it outline shines like a little box. It starts from the top right corner, a top left corner, and moves to the right step by step. And it's looking for the face. And how is it looking for the face? And this is what we will discuss in the further tutorials. But for now, we're just going to say it's looking for certain features of the face and by features for now, we're just going to mean that it's looking for eyebrows, eyes, the nose, the lips, the chin, the forehead, the cheeks and so on. So how exactly it can find them. We'll find out further down the track. But for now, let's just agree that it's looking for those features. So in this box, it can see an eyebrow. So it's like looking through the pixels in this box. It's going like, okay, looking, looking, looking, looking for any of those features and it can detect an eyebrow and then it thinks, okay, this might be a face, but then it realizes that for it to be a face, there has to be an eye, there has to be two eyes, There has to be a nose, There has to be a mouth. So it looks for those other features. And then as soon as it doesn't detect an eye like it looks through this whole box, it doesn't detect any eyes at all. It understands for itself this is not a face. Okay, So let's move on. So it moves on. There we go. Now it's moved a bit to the right. It again looks through it. It detects an eyebrow again. So it does the same thing, doesn't detect any eyes. So then it's not a face. Okay? It keeps moving. Keeps moving. Aha. Now it can detect two eyebrows. But again, the problem is there's no eyes. So it's not a face keeps moving, keeps moving. One eyebrow. Nothing, nothing, nothing. Then skips down. Okay. So now. Now it goes through the image. It can see an eyebrow. Yes. Okay. That's a score. Then it looks for an eye. It can see the eye. That's great. Then next it might look for the nose. And it's looking so it looks. It finds the eyebrow. Yes. Then it looks again. Finds the eye. Yes. And then it looks for the nose. There's no nose. So it's probably not a face because a face has to have a nose keeps going again and keeps going. And then at this point, it might say, okay, I see something. It might think that this is a nose because it's a very remember, it's a very kind of basic algorithm. And from the features that we'll discuss, you might you'll see that it might actually think that this part is a nose in its own right. And so even if it thinks that this is a nose, it realizes that there's no eye, there's no other eye or there's no either eyebrow. So it keeps looking, keeps looking. Then here it might already think that this is a eyebrow eyebrow eye nose. But then it will see that there's no there's no mouth, there's no cheeks, there's not none of these features that are here at the bottom. There's no chin if it's if that's what it's looking for again. And all these things depend on the training of the algorithm, which we'll discuss further down so it will understand. This is not a face. Okay? Keeps going, keeps going. Keeps going. And so here you can see now, this time it's definitely got an eye, an eyebrow, a nose, but no mouth, no other or no eye or a second eye or second eyebrow. So it keeps going. And then here, this is almost a face, but there's no mouth. So it again discards this and keeps going. Keeps going, keeps going, keeps going. And then finally, when it gets somewhere here, it can see that it's got both eyebrows, both eyes and nose and a mouth. So then it highlights this as a very, very high potential to be a face. So, for instance, makes this box green, then keeps going again. And now in this box, you can see another face. It can see two eyebrows, two eyes, a nose and mouth. So it highlights it again and keeps going. So. So after this box, so after it goes over here, it can no longer see the full eyebrow or the full eye. So this is no longer a face. So it keeps going. Now it's going away from the face. And then finally here, no eyes, so no face. So there we go. That's how it scans this whole image. And the size of this box varies because faces can be small or large. So you've got an image. You might have a face over there, which is a small face. So this box, this will happen many times. There will be a box, This box will be a bigger box, it'll be a small box, lots of different boxes. And also, if you notice like we had very large steps. So here you can see the step is quite large and also the vertical step is quite large. In reality, the steps are smaller, the box are more frequent. This is just a visualization for us to get the intuition. And so in reality, you might this face might have not been detected just twice. It might have been detected because the box went through many times, but the steps of the box were much smaller. This face might have been detected many more times, something like that. You know, each box says that there's a high likelihood of a face being there. And when a lot of boxes overlap, that means that that is most likely a face and then it will detect the face there. And that's how you see that green square in your Facebook or in in in any other kind of like face detection application that you might have on your phone and things like that. Again, they might be using other algorithms, but the ones that are using the computer vision algorithm, which Viola-jones created, that's exactly how they work. And then it will just find the same box, the position, the same box on the color image too. And that's exactly what you will see. You won't see the gray scale, you'll see the color image. And so there you go. We found Jade, found his face. So that's in a nutshell how the Viola-jones algorithm works. I know it's very basic, very straightforward, but it was important for us to understand how this box travels through the image, what exactly happens? Because now we're going to start building on top of that and we're going to talk about the training like further down. We'll talk about the training for now. We'll talk about the features. Then we'll talk about some hacks on how this this process is can be can be expedited and how this happens much faster and more efficient. Now, if you would like to do some additional reading, then the best place to start is the original paper by Paul Viola and Michael Jones. It's called Rapid Object Detection, using a boosted cascade of simple features. It's actually a very simple paper. Even though the name might be complex, the the language is very friendly. It's very easy to read. I highly recommend checking it out. Maybe not at this stage. Maybe, you know, maybe you want to go through the tutorials first and then read it after you finish this section. But it will be very helpful. They they talk in very human language, very easy to understand. And yeah, and so here you can actually see an example of some of the features which are used in the Viola-jones algorithm and we'll be talking about these in the next tutorial and I look forward to seeing you next time. Until then, enjoy Computer Vision.