Simple Linear Regression in R - Step 2

Kirill Eremenko
A free video tutorial from Kirill Eremenko
Data Scientist
4.5 instructor rating • 121 courses • 1,735,534 students

Lecture description

Fitting Simple Linear Regression (SLR) model to the training set using R function ‘lm’.

Learn more from the full course

Machine Learning A-Z™: Hands-On Python & R In Data Science

Learn to create Machine Learning Algorithms in Python and R from two Data Science experts. Code templates included.

43:48:15 of on-demand video • Updated January 2021

  • Master Machine Learning on Python & R
  • Have a great intuition of many Machine Learning models
  • Make accurate predictions
  • Make powerful analysis
  • Make robust Machine Learning models
  • Create strong added value to your business
  • Use Machine Learning for personal purpose
  • Handle specific topics like Reinforcement Learning, NLP and Deep Learning
  • Handle advanced techniques like Dimensionality Reduction
  • Know which Machine Learning model to choose for each type of problem
  • Build an army of powerful Machine Learning models and know how to combine them to solve any problem
English Hello and welcome to this art tutorial. So preview Statoil. We prepared our data correctly so that now we are ready to fit our simple in your regression to our data set without any issues. So we are going to do that right now. And as usual we are going to use the simplest way which is to take the L and function. So we're going to do that right now. We're going to call new variable regressors and that's going to be the simple in the regress or itself then equals and then that's where we use the LN function. So let's just type LME here and then let's press 1 to see the info of this function and especially the arguments. So let's see. The first argument we have to input is formula. So let's put it formula. And according to you what is it going to be. Well this is going to be the dependent variable expressed as a linear combination of the independent variable. So here it's very simple since we only have one dependent variable and one independent variable. We just need to type for me that equals salary than our plus. And here you go. And then we put the independent variable which is yours experience. So what does this notation means that means that the salary is proportional to years experience. Okay so that's it for the first argument. That's the formula we need to input. And that's actually the simple in the our regression formula. And then we need to add a second argument which is let's see the data. Okay and that's normal that's because we have to specify to our on which data we want to train are simple in our regression model and of course this data is the training set because the training set is the set on which you build your model. Okay so that's it actually I know there are some other arguments but these are optional arguments that we don't really need here. So we won't just use this to formulate and data. Okay so that's it the aggressor will be ready once we select it and execute it. So let's do this right now and let's press command control plus enter to execute. Here we go now the regressors ready as you can see and just appeared here if you want to have some info about this regressors then the best way to do it is to you know go here in the console and type summary . Regrets are because the name of our aggressor is regress or then type enter and then you have some very good informations about your simple in your model for example. Okay. So let's see let's just put that up right. So first it tells you what the formula is. Okay so it's the salary being proportional to the number of years of experience and that the model is built on the train set. Then you have some info about the residuals we won't be talking about this now. But the really important section is this one coefficient because not only itself the value of your coefficients in the simple linear regression equation but also it tells you the statistical significance of your core efficiency. And here we observe three stars here. That means the years experience independent variable is highly statistically significant because you can either have no star or one star two stars three stars. No star it means that there is no statistical significance and three stars means that there is a high statistical significance. So that's the first info that's a first hand to what is going to happen because we already know by looking at this that there will be a strong linear relationship between the salary and the number of years of experience and the other info here is the P-value and the P-value is another indicator of the statistical significance because the lower the p value is the more significant your independent variable is going to be. That is the more impact the more effect your independent variable is going to have on the dependent variable. And usually a good threshold for the P-value is five percent which means that when you are below 5 percent the independent variable is highly significant. And when we are over 5 percent that means that it's less significant. And here you can see the P-value is one point fifty to ten at the power of minus 14 which means that it's a very very very small P-value. So that means that this independent valuable years of experience is highly statistically significant and it has high impact and high effects on the formula dependent variable. So that's very important information. Get the reflex to look at these by you know Taiping summer aggressor because this is really important especially when you want to try out several potential independent variables you need to look at their statistical significance to choose them. And then you have some informations of your model globally which we'll we'll be talking about at the end of this part part one regression when we'll be talking about ways to evaluate your model when here is you can see you have to multiple the R-squared that are Kyrle We'll talk to you about and the adjusted R-squared if you have several models with several teams of independent variables then that's the adjusted R-squared you must choose to choose the best model. All right so there was just a parenthesis to give you this very important trick to know in our are and learn how to evaluate your model already. So actually we are done fitting the simple in their aggression to our dataset our training set and in the next tutorial we're going to be predicting the test that results to finally see how our simple regression behaves on a new set on some new observations. OK so that's the end of the story. I look forward to seeing you in the next one. And until then enjoy machine learning.