Naive Bayes Explained

Kirill Eremenko
A free video tutorial from Kirill Eremenko
Data Scientist
4.5 instructor rating • 46 courses • 1,899,162 students

Lecture description

Understand the naïve Bayes classifier on an intuitive level, and learn that the naïve Bayes classifier is a probabilistic type of classifier because we first calculate the probabilities and based on probabilities we decide which class to put a new data point in.

Learn more from the full course

Machine Learning A-Z™: Hands-On Python & R In Data Science

Learn to create Machine Learning Algorithms in Python and R from two Data Science experts. Code templates included.

43:48:25 of on-demand video • Updated September 2021

  • Master Machine Learning on Python & R
  • Have a great intuition of many Machine Learning models
  • Make accurate predictions
  • Make powerful analysis
  • Make robust Machine Learning models
  • Create strong added value to your business
  • Use Machine Learning for personal purpose
  • Handle specific topics like Reinforcement Learning, NLP and Deep Learning
  • Handle advanced techniques like Dimensionality Reduction
  • Know which Machine Learning model to choose for each type of problem
  • Build an army of powerful Machine Learning models and know how to combine them to solve any problem
English Hello and welcome back to the course of machine learning. This is Kirill Eremenko. And in today's Turrell we're talking about the naïve base classifier. This is a very interesting machine learning algorithm and today we're going to get to know it on a very intuitive level and in line with the super data science mission which is making the complex simple. We're going to break down this complex topic into simple steps and bite size pieces of information. I've got some very exciting slides prepared ahead. So let's dive straight into it. All right so here we've got the Bayes Theorem and this is something we talked about in the press tutorial so by now it should be quite comfortable with the concept how we're going to apply it to create a machine learning algorithm. Well let's have a look here we've got a data set. So it has two features. It has x1 and x2. And there are two categories Category 1 which is rated Category 2 which is green but instead of working with these abstract terms we're going to convert them into something that we can understand a bit better or something that's a bit easier to operate with or to talk about. So we're going to call the Y variable extra variable salary and the x 1 variable is going to be age . So basically we are presenting observations or people that are part of a data set in terms of their age and salary as you can see we have 30 people here on this chart. And the candidate is we're going to replace them with walks. Meaning that person walks to work and Green will be dry. That means that person drives to work. And so now we get our problem to the machine learning challenge that we're going to be solving. What happens if we add a new observation a new data point into the set. How do we classify this new data point. So as you can tell this is a supervised machine learning algorithm because we're classifying something based on previously known Klaas's. And so the question is is this person going to be classified as a person who walks to work or is this person going to be classified as a person who drives to work and then they leave base algorithm is going to help us solve this challenge. All right so how are we going to approach this. We need a plan of attack it is going to be quite a complex approach. But at the same time we're going to break it down into steps and they'll all make sense will be very easy to understand. So our plan of attack we're going to take the Bayes Theorem and we can apply it twice. First time we're going to apply it to find out what is the probability that this person walks given his features and X over here is the features or presents the features of that data point so let's go back to the visualization here. So here you can see that this is our new datapoint that person has a certain age so let's say the age of that person maybe is like 25 years old. And then they have a salary so let's say their salary is $3000 per year. So those are features of this observation right now we're only working with two variables just for simplicity's sake so we can visualize things age and salary but in reality there could be many many many more features that could be features on how many what what industry they work in or how many years of education they have or how long they've had a driver's license for. And I think they got off how far away they live from work so there could be lots of variables. But at the same time right down are you going to be dealing with two age and salary and regardless of how many variables you have they will be called in we're going to call them features. So given the features of x so given the age of 25 and the salary of $30000 and we'll talk in more detail about exactly what we mean by features just in the moment. And so therefore this part represents that person that we're trying to classify what is the likelihood of a person with those features. So we know that we are taking somebody for those features that we have in our new data point. What is the likelihood of them walking and then you've got the right side. So we're going to talk through each one of these as we calculate them. But for now let's just give them their names going from right to left. So this one on over here is called the prior probability and we're going to calculate that first because it's the easiest to calculate. Next one is the marginal likelihood and we're going to calculate that second the third one is a likelihood . That's just the names that they have. And we're going to Califate that third. And finally what we're looking for is called the post theory or probability we're going to calculate that force. All right so that's our plan of attack for step 1. This is all still step 1 to calculate the probability that somebody walks. Given those features X that we see in our new data point. Next we're going to have Step two where we're going to calculate the probability that somebody drives given those features X that we see in our new data point. And again here will have the probability which will calculate first then the marginal likelihood then the likelihood and then you'll get to pester probability. And finally we're going to compare the possibility that somebody walks given features X and versus the probability that somebody drives human features X and then from there we'll decide which Clauss to put that new data point in. So as you can see that the base class for is a probabilistic type of classify because we're first calculating the probabilities and then based on probabilities we're assigning it close. All right. So are you ready to form these steps it's going to be lots of fun we're going to take it nice and easy nice and slowly so that we understand everything. And after this event to be very comfortable with the Navy base catfight. Step one. All right so here we have our installation. Let's move it to the left a little bit so we can make some space. Now we're going to calculate the first probability in our Bayes Theorem. We're going to calculate the probability that somebody walks right just the overall probability. What does that mean. That is the probability that somebody to fight knowing anything about them so we're just saying we're going to add a new observation to our data set into here. But we don't know their age and we don't know their salary is going to put it somewhere into our data set. What is the probability that this person that we're adding to our database walks to work bullets. Very easy answer for from here we don't have much choice. The only thing we can do is calculate the number of read observations. Number of people that actually walk and divide by the overall number so probably that person walks to work with farden any other knowledge is the number of walkers and number of people or walk which is these are adults divided by the total number absorption the green dots are the gray dot isn't participating in these calculations. So here we have probably if somebody walks is 10 10 red dots divide by 50 dots overall. All right. So that was easy. We've calculated the prior probability next with calculating the marginal likelihood. And this is where things get interesting. So how do we calculate the margin of likelihood. Let's have a look. Here's our data set again. And the first thing you can do is we're going to select a radius and we're going to draw a circle around our observation like that. Now this radius you need to select on your own and you need to decide for you. Algor and this is going to be like an input parameter or an algorithm you could select less because like that more it's up to you. Now what does this radius do. Well what we're going to do is we're going to first of all let's just to make things easier. We're going to remove our DOT for now just so that it's not confusing us. And then we're going to look at all the points that are inside this series and what we're saying here is that all of the points inside the circle are we going to deem them to be similar in terms of features to the point that we had. The point that we had. Remember it had an age of for example 25 and a salary of $30000 per year. So now we're going to draw a radius around it and let's say anybody between the age of 20 and 30 and in the salaries of $25000 to $35000. Anybody that falls in that circle again is it's not a square it's a square is a circle. Anybody who falls somewhere is somewhere in that vicinity is going to be deemed similar to the new data point that we're adding to our data set. So as you can imagine this radius is actually going to have a big say in the way your algorithm works . Well let's say we have this radius and this is how it all played out. We have three red dots one green dot in them. All right. So now what do we do. How do we calculate the probability of X and what is the probability of X. Well the probability of X is the probability of a new point that we add to our data set being similar in features to the point that we actually are adding to it. So basically it's a probability of that new point that we're adding or like any random point that we add is the probability that any random point to fall into this circle and P of X is calculated as the number of similar observations so the number of observations that already we can see in the circle so 1 2 3 4 divided by the total number of durations which is 30. So p of X is foredoomed Bethy. Once again just to reiterate P of x it tells us what is the likelihood of any new random variable that we add to this data set fulling inside the circle. And it is 430 because we only have four. Based on prior knowledge we can solve this for here and this 2d it also is four with 30. All right so that wasn't hard at all as well. We called the marginal likelihood. So so far we got this one and we got this one. Next we're moving onto the likelihood and this is probably the most complex one. What is the likelihood that somebody who walks exhibits features X will actually after we've spoken about the marginal likelihood calculating the likelihood won't be as complex. So let's have a look. So there is our chart. And now what we're going to do is we're going to draw the same circle again and once again we're going to remove the gray point for now and we're going to color a circle. And so anything that falls inside the circle is deemed to be similar to the point that we're adding . So the question is what is the probability that a randomly selected data point from our data set will be similar to the data point that we're adding. So basically what is the likelihood that a randomly selected data point will be from this circle given this vertical pipe means given that that person walks that we know that that person walks to work the other way to think about this is we're only working with people who walk to work. So we're only working with the red dots which represent people who walk to work. So let's forget about the green dots. They're like they're now they're faint and we're not even talking about them at all. We're only talking about the red dots. So the question is given that we're only working with the red dots What is the likelihood that a randomly selected datapoint from our daughter said from the red dots is somebody who exhibits features similar to the point that we are adding to our daughters. And so basically what is the likelihood that a randomly selected red dot falls into this gray area into the circle. That's what the question we're asking. And there's also very simple now that we know how all this works. It's basically the number of civil or observations among those who work so the number of red dots that actually fall inside this red circle in this great circle that's three divided by the total number of walkers so people and total number of people who walk to work and that is three over 10. There we go. So that's our P of the likelihood of somebody exhibiting the features similar to that broader point that we're about to add given that we're only selecting among the red dots. So that's three over 10 and that was our likelihood. So now if we plug all that in so there we go that likelihood is done. So if you plug all of that in. We'll get our posterior probability. So three over 10 times 10 or 30 and divided by four or three. So if we calculate that Ill give us zero point seventy five. Seventy five percent is the probability that somebody that we put into the place where we're putting x is should be classified as a person who walks to work. That was step one was pretty intense ride pretty exciting to calculate this value. Now the next step is step two step one done. Next step a step to do the same thing for the likelihood that somebody with features X will be classified or should be classified as a person who drives to work. And here to you a challenge I'm going to challenge you to post this video or rewind back to find out . To have the image in front of you and do these calculations yourself to actually go through the same steps and perform those calculations. If you'd like to see and compare to my calculations then I'm going to put in another video after this one so another tutorial after this one in the course you can just go to the next tutorial and compare . Otherwise I'm going to show you the result now. So the result is one of a 24 likelihood or this are from the right probability is 20 or 30 margin electrical remains unchanged for with. Likelihood changes to one over 20. So the probability of somebody who exhibits features X being a person who drives to work is 25 percent . So that was step two. Now we're going to do step three. We're going to compare the probability of somebody with features X Brazilia of them being a person who walks to work versus the probability of somebody features X being present drives to it. So it's 75 percent versus 25 percent and therefore the first is great and the second and therefore it is more likely that that person with features X is going to be a person who walks to work than the person who drives to work. So still a 25 percent chance that that is a person who drives to work but. Percent chance that it is a person who walks or is gray 75 percent and therefore we're going to classify this point as a person who walks to work. There we go. That is how they leave base algorithm in machine learning works. I hope you find this Tauriel useful. I was pretty excited and pretty proud of these Sly's and hopefully this is a step by step and a simple explanation of a complex concept. And I look forward to seeing you next time. Until then enjoy your machine learning