Viola-Jones Algorithm

Hadelin de Ponteves
A free video tutorial from Hadelin de Ponteves
AI Entrepreneur
4.5 instructor rating • 77 courses • 1,045,642 students

Learn more from the full course

Deep Learning and Computer Vision A-Z™: OpenCV, SSD & GANs

Become a Wizard of all the latest Computer Vision tools that exist out there. Detect anything and create powerful apps.

10:58:47 of on-demand video • Updated August 2020

  • Have a toolbox of the most powerful Computer Vision models
  • Understand the theory behind Computer Vision
  • Master OpenCV
  • Master Object Detection
  • Master Facial Recognition
  • Create powerful Computer Vision applications
English [Auto] Hello and welcome to the course on computer vision. Today we're talking about the viola and Jones algorithm. This is the algorithm that lies at the foundation of the open CV a library. And this is one of the most powerful to date algorithms for computer vision. Let's have a look. So this hologram was developed by two people. Paul Viola and Michael Jones and it was developed in 2001. How crazy is that. It's been like over 16 years since then and it's still one of the most powerful algorithms in the world. It's slowly being surpassed by deep learning. But nevertheless it is in its simplicity it's so powerful that it is still being used for computers and it's not just detecting faces on images. This is actually for real time computer vision detecting faces on videos as well. That's how powerful it is. The voltage on the algorithm consists of two stages. The first stage is the training. The second stage is the detection. So training of the algorithm and then detection of the actual faces in application. And we are going to start off by talking about detection. This might be a bit counterintuitive because you think you want to start with training but it just makes it be much clearer if we start with the detection and understand how it works. Once it's already trained up once everything is in place we will see how it works in action. And that way when we're talking about training everything will make much more sense why exactly certain things are structured in certain ways. So here we go. Let's get started. We've got a photo here and we're just going to call this person Jaideep to keep them anonymous but they're totally comfortable for us using this photo. And yet I actually want to add LUNs friends and yeah so JD is is a great example of a photo here because you can see the face it's a frontal face and that's exactly what the voltage on the algorithm looks for. It's designed to look for frontal faces not on us not somebody looking to the side or up or down. It's like most of the time it's performs the best with frontal face. That's what it's designed for. And the first thing we first step that we do in the well Joe's algorithm is that it is turned into a gray scale. So for simplicity's sake it's just easier to work with grayscale images results are still astonishing. And there's just less data to process. That's why it turned to gray scale. Then once the face is detected Jones algorithm finds that actual location or face on the color image so you won't even notice that it's working grayscale but in reality in the background while Jones algorithm is working with a greyscale version of your image. And so what the very low Jones algorithm does is it starts looking for the face its outlines like a little box starts from the top right corner top left corner and moves to the right. Step by step and it's looking for the face. And how is it looking for the face. And this is what we'll discuss in the further towards. But for now we're just going to say it's looking for certain features of the face and by features. For now we're just going to mean that it's looking for eyebrows eyes the nose the lips the chin the forehead the cheeks and so on. So how exactly it can find them we'll find out further down the track. But philologists agree that it's looking for those features so in this box it can see an eyebrow. So it's like looking through out through the pixels in this box is going like k looking looking looking looking for any of those features and it can detect an eyebrow and then it thinks OK this might be a face but then it realizes that for it to be a face as to be an eye there is to be two eyes a nose as mouth and looks for those other features. And then as soon as it doesn't detect an eye like it looks at this whole box it doesn't detect any eyes at all. It understands for itself this is not a phase. OK. So let's move on. So it moves on. There you go. Now it's moved a bit to the right. It again looks through the detects an eyebrow again. So does that same thing doesn't detect any eyes. So then it's on her face. McCain keeps moving and keeps moving. Aha now it can detect two eyebrows. But again the problem is there's no eyes. It's not a face. It's moving it's moving one mile Barrow nothing nothing nothing then skips down. OK so now now it goes to the image you can see an eyebrow. Yes. OK. That's the score then it looks for an eye. You can see the eye. That's great. Then next it might look for the nose and it's looking so it looks it finds the eyebrow Yes that looks again finds the eye. Yes. And then looks for the nose there's no nose so it's probably not a face because the face has to have nose keeps going again and keeps going. And then at this point it might say OK I see something that might think that this is a nose because it's a very I remember it's a very very kind of basic algorithm. And from the features that we'll discuss you might you'll see that it might actually think that this part is a nose or in its own right. And so even if it thinks that this is a nose it realizes that there's no eye there's no other eye or there's no other eyebrow. So he keeps looking keeps looking. Then here it might or do you think that this is I eyebrow eyebrow. I know. But then you will see that there's no face there's no mouth there's no cheeks there's not. None of these features that are over here at the bottom. There's no chin. If it's that if that's what it's looking for again. And all these things depend on the training of the Algor which we'll discuss further down. So we will understand this is not a face. Ok keeps going keeps going keeps going. And so here you can see now this time it's definitely got an eye an eyebrow a nose but no mouth no other or no eye or second eye or second eyebrow. So it keeps going. And then here this is almost the face but there's no MUF. So again discards this and it keeps going keeps going keeps going keeps going and then finally when it gets somewhere here it can see that it's got both eyebrows both eyes and nose and a mouth. So then it highlights this as a very high very high potential to be a face. So it for instance makes his boxers and then keeps going again and now in this box you can see another face it can see two eyebrows two eyes and nose and mouth. So it highlights that again it keeps going on. So after this box office he goes over here it can no longer see the full eyebrow or the full eyes so this is no longer a face. So he keeps going it's going from the face. And then finally here. No eyes or no face. So there you go. That's how it scans this whole image and the size of this box varies because faces can be small or large. So you've got an image might have a face of there is this small face. So this box this will happen many times there'll be a box. This box will be a bigger box will be a small box lots of different boxes. And also if you notice like we had very large steps so here you can see the step is quite large and also the vertical step is quite large. In reality the steps are smaller. The books are more frequent. This is just a disposition for us to get the intuition. And so in reality you might this face might have not been detected just twice might have been detected because the box went through many times. What were the steps of the box were much smaller this facemask it might have been detected many more times something like that. You know each box says that there's a high likelihood of being there. And when a lot of boxes overlap that means that that is most likely a face and then it will detect the face there. And that's how you see that Green Square in your Facebook or in in in any other kind of like phase detection application that you might have on your phone or and things like that again. They might be using other algorithms but the ones that are using the computer vision algorithm which Jones created. That's exactly how they work. And then it all just find the same box the position the same box on the color image too. And that's exactly what you will see. You see the grayscale you see the color. And so there you go. We found J.D. found his face. So that's in a nutshell how the vial of Jones algorithm works. I know it's very basic very straightforward but it was important for us to understand how this box travels through the image what exactly happens because now we're going to start building on top of that and we're going to talk about the training like for the Donald who about the training for now will talk about the features then we'll talk about you know some hacks on how this this process is can be can be expedited and how this happens much faster and more efficient. Now if you would like to do some additional reading then the best place to start is the original paper by pull oil and Michael Jones. It's called Rapid Oggi dictation using a boosted cascade of simple features. It's actually a very simple paper even though the name might be complex. The the language is very friendly it's very easy to read I highly recommend checking it out. Maybe not at this stage maybe you know maybe you want to go through tutorials first and then read it after you finished the section. But it will be very helpful. The talk in a very human language are very easy to understand. And yeah. And so here you can actually see an example of some of the features which are used in the Jones algorithm and we'll be talking about these in the next tutorial and I look forward See you next then. Until then enjoy computer vision.