Poly Regression: Data Visualization Tutorial

A free video tutorial from Dr. Ryan Ahmed, Ph.D., MBA
Professor & Best-selling Instructor, 340K+ students
48 courses
355,503 students
Learn more from the full course
Machine Learning Regression Masterclass in Python
Build 8+ Practical Projects and Master Machine Learning Regression Techniques Using Python, Scikit Learn and Keras
10:19:57 of on-demand video • Updated January 2023
Master Python programming and Scikit learn as applied to machine learning regression
Understand the underlying theory behind simple and multiple linear regression techniques
Apply simple linear regression techniques to predict product sales volume and vehicle fuel economy
Apply multiple linear regression to predict stock prices and Universities acceptance rate
Cover the basics and underlying theory of polynomial regression
Apply polynomial regression to predict employees’ salary and commodity prices
Understand the theory behind logistic regression
Apply logistic regression to predict the probability that customer will purchase a product on Amazon using customer features
Understand the underlying theory and mathematics behind Artificial Neural Networks
Learn how to train network weights and biases and select the proper transfer functions
Train Artificial Neural Networks (ANNs) using back propagation and gradient descent methods
Optimize ANNs hyper parameters such as number of hidden layers and neurons to enhance network performance
Apply ANNs to predict house prices given parameters such as area, number of rooms..etc
Assess the performance of trained Machine learning models using KPI (Key Performance indicators) such as Mean Absolute error, Mean squared Error, and Root Mean Squared Error intuition, R-Squared intuition, Adjusted R-Squared and F-Test
Understand the underlying theory and intuition behind Lasso and Ridge regression techniques
Sample real-world, practical projects
English [Auto]
Hello, everyone, and welcome to this lecture. I'm super excited because now we're getting a little bit closer to actually visualizing the data and training our model as well, using polynomial regression. So the first step is we want it to do is we want it to visualize the data set in this lecture. In the previous lecture we mainly covered how can we import all our libraries and import all our datasets using a Pandas data frame? So here we have simply our salary, which is a data frame that contains all the information our data contains consists of two columns, our number of years of experience versus salary. And we're able to we're able to obtain the head and the tail indicating the the first couple of samples, last couple of samples. And let's go ahead and visualize the data using Seaborn in this section. All right. Let's go ahead. So I'm going to say, okay, SNS dot joint plot and we're going to pass along X equals to number of years. My apologies. We have to make sure that years of experience. Actually exactly matches the column here. So we have years of experience. Please make sure that this is an uppercase. This is uppercase as well. And this is the first variable. And then we're going to plot y equals two. That will be our salary. Right? And we have to specify the source of the data. So our data source is equals to our salary information. All right, let's press shift. Enter. We run that and here we go. Looks great. So the number of years of experience, as we increase the number of years of experience here, you will find that the actual salary move forward as well increase as well. And you will find here that the data actually simple linear model won't make, won't won't be a good fit in this case, which means that we need kind of to step up our game and, you know, move to a polynomial regression instead of simple linear regression. All right. The next step is we're going to say SNS dot LM plot and we're going to pass along X equals to the number of years of experience. And please make sure that this is why uppercase and we're going to pass along our y value, which is. We're going to be our salary information and then we're going to pass our data, which is a source. We're going to be our salary data frame. All right. If you guys recall from previous sections that LM plot, we're going to be used to simply plot kind of a quick estimate of the best straight line that can fit the data. So apparently here, this is our straight line and the straight line is basically terrible. All right. So it doesn't fit the data pretty well, which makes sense because now we actually need to kind of step up our game a little bit and use polynomial regression instead of simple linear regression. All right. Now it's time for a quick challenge. What I'm asking you guys to do is I'm asking you to two tasks, actually. First one is to do SNS to join plot. But instead of plotting the number of years of experience versus salary, I want you guys to do it the other way around. I want you to plot the salary versus the number of years of experience first and the second one. I want you guys to use pair plot as we have done before, to just plot all the information, all the data using the pair plot. Please go ahead, pause the video and I will see you guys after the challenge. All right. I hope you guys were able to figure out the challenge yourself. So we'll ask you guys to do is to simply go here, use the join plot. Let's copy that. Let's put it here. And I'm asking you, instead of having years of experience on the x axis, we're going to put the salary here on the x axis and instead of Y, we're going to put the number of years of experience. And again, please make sure this is in uppercase. And here we go. Let's run it. And here we go. So here we have the salary and here we have the number of years of experience and obviously the data as well, or the relationship is still non linear. And the next step or the next challenge, I ask you guys to use pair plot to view all the data. So we're going to say, okay, SNS dot pair plot and we're simply going to pass our salary information in there, which is the entire data frame and pair plot will take care of everything for us. All right, let's shift enter and here you go. So pair plot. Actually, I personally prefer pair plot because it's kind of like you don't need to plot each two variables separately. It was just going to try all the different combinations for all the data. So here you will find the number of years of experience and here you have the salary and that's the curve going up in this in this fashion. Again, as you increase the number of years of experience, the salary increases, this is basically the exact same curve as we plotted here. Right. Okay. And then here, this is simply our salary versus the years of experience, which is the curve that I ask you guys to plot here. Again, you don't need to do them separately. You can just use pair plot, pass along the data frame and you're good to go. All right. And here it will show you the the distribution for the number of years of experience. You will find that here the average is around ten years of experience. And the salary here is around, let's say around $100,000 per se, mean value. And if you guys go up, actually we have this information somewhere in here. So the salary mean is around 111,000, which makes kind of sense, which pretty much matches our information, our matches, our data in here. All right. So let's go. Keep going. And step four, we're going to learn how can to create we're going to create our training data set. All right. So we're going to say I'm going to say, okay, if X, which is our input to the model, we're going to be our salary information and here we're going to go ahead and here and pass along our years of experience. That will be our input to the model. Let's run that. Let's take a look at X and that is our X input. That's basically our number of years of experience. The first column, right? And we have 2000 samples, 2000 rows by one column. Right? Let's actually remove this. And if you want to take a look at it, let's take a look at X dot shape. If you take a look at it, it's 2000 by one. Looks perfect. The next step is going to say, okay, y equals two. That would be our salary. And if we go inside it and if we go to our column salary again, if you go up to the data frame, if it's found somewhere in here, you will find that we have two columns, the years of experience versus the salary. So simply because the years of experience is our independent variable, that will be our X, our input and salary. We're going to be our output, which is our dependent variable already. So let's go down here. And that's our y information. Let's run it. It looks good. Let's take a look at Y. All right. This is our y variable. Looks perfect. And the last step is. And that's actually very important. And this is very important. Note here, in this polynomial regression, we're not going to divide the data into training, training and testing what we're going to do. We're going to try just to get the best fit model, just using the entire data as a training data set. Again, this is a very simple example. In the future, when we move to kind of more advanced models, we have we're going to be performing the division, we're going to divide the data into training and testing here. We're not going to be doing this. We're just going to use the entire data for training. As you guys can see here, I'm going to say, okay, X train, we're going to be equals to X, and here we're going to say Y underscore train. We're going to be equals to Y. Let's run it. And here you go. That's pretty much it. All right. Okay. And that's all what we have for this section. Let's go up and recap what we have done so far. So in this section, we're able to simply visualize the data. So we visualize the data here using Seaborn, using join plot, using LM plot. And we realized that straight line will be kind of a kind of a failure here. And that's why we need to go to a polynomial regression model. We use as well joint plot in a mini challenge to plot the salary versus the years of experience. And and we also use Pairplot to visualize all the data kind of in one stop shop. And we're also able to create our training data sets here, as you guys can see here. And we are pretty much ready with simply X train and Y train to go ahead and train our model. In the next section, I'm going to walk you through the first solution, which is assuming a linear assumption and the following section. I'm going to show you the polynomial regression, which is kind of, you know, the best model that we'll be able to obtain and that's it. I hope you guys enjoyed this lecture and see you in the next lecture.