Naive Bayes Classifier : An example
A free video tutorial from Loony Corn
An ex-Google, Stanford and Flipkart team
We will see how the Naive Bayes classifier can be used with an example.
Learn more from the full course
From 0 to 1: Machine Learning, NLP & Python-Cut to the Chase
A down-to-earth, shy but confident take on machine learning techniques that you can put to work today
19:50:09 of on-demand video • Updated January 2018
Identify situations that call for the use of Machine Learning
Understand which type of Machine learning problem you are solving and choose the appropriate solution
Use Machine Learning and Natural Language processing to solve problems like text classification, text summarization in Python
In the last class we've seen the foundation of knife based classifiers. Let's go ahead and apply these classifiers into a problem. Let's see. You have to classify a basket of fruit in apples and bananas. Let's see how we could build a nice base classifier to perform this classification. What could be the relevant features for this classification. We can capture three attributes for each group the land of bread and the color of each fruit. Now you know that classification is a supervised learning any meaning will have a large amount of training data. So we have information about a large number of fruits which are already classified based on these three attributes and that and color. Now we're given a food for which we have to opine whether a fruit is an apple or a banana. This is clearly a classification problem. The fruit other instances of on banana or are labels that got degrees since we have only two categories. This is a binary classification problem and bread and color other features for instance is that each fruit has a feature of three components length three the information that we have about a large number of fruits which are already correctly classified. That is the training data. Why the food but we have to classify it is our problem instance. So we know that the knife based classifiers supervised learning technique and it requires details about features of its instances the apples and bananas. So let's see we how all of these. It is the length and breadth of Apple that are normally distributed with mean of 5 inches. A standard deviation of 1 inch. Recall what normally distributed is that most of the variables are on scintillated near the peak of the bell shaped curve. The mean and standard deviation is the Vitt the spread of Baco So length bit are both normally distributed not coming but the color of the apple. It is green. 30 percent of the time I read 50 percent of the time and yellow one percent of these numbers are nothing like that before the fact probably of the color of apples. These are derived from training that are using standard mathematical. It is. We'll get into that in a bit. Let's move on to Bernanos. The length of Bernanos is normally distributed the meanest five inch while the standard deviation is 1.5 inch. But again normally this would mean two inches. Standard deviation off point three inches. Followed the problem of the kalah. It is green. 50 percent of the time and yellow 50 percent of the time. We would also require the proportion of apples and bananas in our training data. So it's 55 percent of the fruits are apples and 45 percent are bananas to be how our problem instance here we have a fruit of green fruit which is six inches long and 3.5 inches broad. We have to classify it in either up or Bonano. How do we go about this. We have a general problem to eat. And from there we'll have more in this SFE probably of this group being an apple or a banana. Given a set of conditions so you are going to find out after the fact. Probability of this fruit being an apple or banana and given it specific features that lent breath and color. So first we find out the probability of the fruit being an apple given the land sakes. The bread is 3.5 and the color is green. Similarly We'll also find the probability that the fruit is banana given the same feature length 6 that 3.5 and Collodion. Let's just quickly recap how in this example we move from before the fact finding after the fact probability. So in the formula we reverse the after the fact probably we were trying to find that played it either before the fact probably what we were trying to find the probability that the dog it crash and divided it by the probability of the evidence occurring on the whole without any conditions attached. So in that example we only had one evidence why in this example we have tree evidence. So it simply will only keep my playing the likelihood of each of the evidences we are not willing to make a combined a joint probability of these evidences are together because an important assumption that night based classify makes is that all features in a feature are independent of each other. So let's go ahead and find the probability that the fruit is an apple given the land the six the bread is 3.5 and the color is green. So first we egg before the fact probably the probability that the fruit is an apple multiplied by the probability that the length is six given fruit is an apple. Are finding the likelihood of that evidence is six will the likelihood of our second evidence the bet the probability that the bread is 3.5. Given the fruit is an app and same for a low probability that the Callet is green given the fruit is Apple this divided by the probability that the Lent is six but is three point five and the color is green. This is nothing but the probability that these evidences ocurred together. In any given fruit Similarly we use this same sort of formula for probability that the food is banana given that Lent is six. Bread is 3.5 and colors green stopping red before the fact review of the fruit being a banana multiplied by appropriate needle in the six is given the fruit is banana multiplied by a problem that the bread is 3.5. Given the fruit this banana and same father alone the whole thing again divided by the probability that the line is 6 that this trip on ventilators green for any given fruit whichever of these two probabilities has an higher value. We classify the fruit in that category. Now notice that the denominators are the same in both cases. So we can cancel that denominators out because we are comparing a higher of. We need to simply guess the numerator and choose the larger number. So let's compute the figures. We know that in our training data sets of 5 percent of Apple and 45 percent of Bernanos. So the probability that the fruit is Apple is at 55 percent and fruit has been a nice 45 percent for the dollar. We already know the probability the probability that the apple is green is 30 percent and the probability that the banana is a bean is 50 percent coming to the land and Brett will be using the standard probably eat a bit because when the mean and standard deviation is given for normal distribution you can easily find the probability of any given but Biglow value from the stable. So I look for the values we want and I'm just filling in the figures now. After multiplying all these figures for the first problem we are trying to find we get 0 0 5 and for the second one we get on six zeroes in one knee. So the after the fact probability is higher in the first case. That is the fruit is an apple given the lead to 6 percent to sleep on the calories green. So we have our date that the fruit is Apple and we assign that label to our fruit. Now that isn't very important is an option that we are making here. As we mentioned earlier we did not make drink on any of the joined properly. The problem is that we calculated that whether it was before the fact but after the fact but all that independent I dont eat all the features were measured independently for instance had we done unjoin probably because Lieschen around food with Lent that ratio say six is to 3.5. We might have concluded that this fruit was a banana instead of APTA. So this is a very important as an option. In fact the Navy's classifier is quite naive because it is U-M the features at independent India probability distribution is one more thing you might be wondering how BP arrived before the fact probably in the first case how do we estimate the length and breadth the means and standard deviations of these events and Bretts in our training data. They calculated usually using a famous mathematical technique the maximum likelihood estimation.