Students will be able to create websites, build applications, create Artificial Intelligent learning programs that can recognize handwriting and learn while analyzing data.
Will help you get a job as a Fullstack programmer or Artificial Intelligence data scientist.
Build over 10 AI data analysis tools
Welcome back. We're now going to get started with Pasqua and data in Python using K nearest neighbors also known as cane and in this case we're going to be using the date data found in the Irish set. So let's load Irus from S-K learn we import Iris and load that. So we also have a test and training data set. So we've already loaded our test right here and here's our training. Just make a note that the more often you train your data the less predictable it becomes. We're just doing it this way because we're using the iris data set and this is a training exercise we're not supposed to be repeatedly looking at our tests. But we are doing that in this exercise. So just be aware of that. I hope it's clear that every time you look at the test set you're training the algorithm and it's behaving the algorithm is behaving as though it's seen the test data before. So in a normal situation you would just be looking at training data and then you would apply that algorithm to the test data. But in this circumstance we are doing that in order to make an example. Let's run this this start up here. And we're getting a perfect score and that's because our training data is what we're testing on so training can amounts to keeping training data in memory. And it creates a new data point without a label that needs to be classified. So the point of canon is to classify this new data point that's currently unlabelled. So let's performant cross-validation check on our K that's import the frame Here's our neighbors classifier and this is Kross validate method and we're going to put our parameters in. So we have the predicter the features array our target array the number of fold. So we were using 10. We don't want to train we don't want the scores from our training and we're asking what scores you want to return. So we're asking for the accuracy that's loaded. How we get our data back. So when you look at this data you see how everything is being compared to everyone else that's important to know because Cayden winds up using a lot of memory and it use a lot of computing power when making predictions. And the more features you have the more canyon's going to struggle. It's also known as the curse of dimensionality. So Canon is not exactly the most efficient way to arrive to any prediction but it can be very very thorough. So looking at our cross-validation scores you can see that we did 10 fold. So we have 10 by 10 so previously when I ran this I got the highest score when they equaled 8 but now it's giving me a different score which me which is fine because every time you run the test they can be a little different. So I can just update this but it's going to be around here. So I should probably point that out and the slightly different in your cross-validation say that. And let's go ahead and run this here. Our our class fire is doing well at predicting ptosis species and worst for the virginica species. But when you look at this data you can see that we actually didn't miss any prediction which basically means every prediction was correct. That's why everything is a one don't expect that to happen in real life so we're going to plot this out. Let's talk about these plots. So these plots actually help us visualize. Can an so species is marked by a shape block has the training set. And blue is correctly labeled training data and we can see that every species that was predicted by the algorithm was the correct species and we can see it for simple length and simple with which is this top plot here. We can also see it for Pedda length and pedal with it which is this lower plot so that's it for our K nearest neighbors classifier. And we'll talk about decision trees I believe in the next lecture. See there. Thank you.