What is Multiple Regression?

Scholarsight Learning
A free video tutorial from Scholarsight Learning
Courses in High Impact Research & Technology
4.2 instructor rating • 5 courses • 23,708 students

Learn more from the full course

SPSS Masterclass: Learn SPSS From Scratch to Advanced

A complete step by step course to master IBM SPSS Statistics for doing advanced Research, Statistics & Data Analysis

32:08:01 of on-demand video • Updated September 2020

  • Analyse any type of numerical data using SPSS with confidence
  • Independently plan your research study from scratch.
  • Understand the research design and results presented in high quality journal articles
  • Do data analysis accurately and present the results in standard format.
English In this lecture, we are going to learn about multiple regression. So, what is multiple regression? Multiple regression, as its name suggests, is a method of regression analysis in which we see the effect of multiple or many independent variables on one dependent variable. So, look at this data set that I have taken from the SPSS sample data set, that you can locate in your C-Drive. This is the employee that data set it and I have a data of employees, which are arranged according to the ID, gender, birthdate, education, job category, salary, salary in the beginning, at the time of joining and job time, previous experience and whether they belong to minority community or not. Now, suppose in this dataset, I want to find out what exactly determines the current salary drawn by the employees. It could be their experience. It could be also, their salary, at the time of joining because it's logical to assume that those employees who were drawing higher salary at the time of joining they'll draw a higher salary. currently. Also, we can guess that experience of the employee is also going to contribute to their salary that they are drawing. Apart from this, we have also the education. So, education is again important criteria for determining the salary. Those employees who are highly educated, we can presume them drawing higher amount of salary, as compared to those employees who are less educated. Similarly, we can also guess that salaries drawn will also be affected by the position held by the employees. In our case, we have the 3 dategories of employees - clerical, custodial employees and managers and we make a guess that managers are drawing higher salary or would be drawing higher salary, as compared to clerical or custodial employees. But, if we want to test this assumption that job category, education or the position of the employee or the salary at the time of joining the organization, they are the causative factors or they are the influence factors in the current salary drawn by the employee. So, in that case, we have to run a multiple regression analysis. So, the idea of multiple regression analysis is very clear. When you want to predict one dependent variable, in our case, its current salary by many independent variables, like say, education, job category, beginning salary, job timing, previous experience, then you can perform a multiple regression analysis. When you perform multiple regression analysis, your variables must be logically selected. For example, do you believe that being in a minority status does affect the salary of a person? Well, it may or may not. So, it's interesting to see this but theoretically, if you find any justification that minority affiliation of the person may affect his or her salary, then you can include that variable as well in the multiple regression analysis or if you want to include all the variables in your multiple regression analysis, you can and the SPSS is going to tell whether this variable exercise a significant influence on the dependent variable or not. So, the model of any multiple regression is very simple. You have to select a dependent variable. Generally, we denote our dependent variable by the symbol 'y' and then you have many independent variables and we can call them 'x1', 'x2', 'x3' , till we can have 'xn' and then we are going to get the coefficient by applying the multiple regression analysis. So, suppose those constants or coefficients are α1 x1 + α2 x2 + α3 x3.. till αn xn and when you make a prediction.. Now, you are predicting the 'y' on the basis of these X variables from x1 till xn. You are going to make some amount of error because we cannot always account or find all those variables who will completely, in joint, predict the 'y'. So, there's bound to be some error term. So, again we are going to find that out and apart from that, we are going to have a constant, as well, in our regression equation. So, there is a constant. So, that is our typical theoretical regression model. Now, I have used the word 'alpha' (α) but most typically, people use the word 'Beta' (β).So, we can again rewrite the equation as β1 x1 + β2 x2 + β3 x3 .. till we have βn xn. Then, our error term (e) plus constant term (C) and you can clearly guess these beta's are the standardized regression that we are going to get after the the regression analysis. Now, what was our case? We wanted to predict the current salary of the employees. So we'll write our multiple regression equation as current salary = β1. Now, let's take one variable as the beginning salary of the employees. So, β1X beginning salary. Then, we can take our second variable as the education. So, education category X β2. So, I need to write β2, here. So, β2 X education category. Then, third variable, we may take, as experience. So, for that, we are going to get a third coefficient that is, β3 X experience. So, we have built a simple model by taking into three variables. If you want to take into account more variables, you can and you can make a lengthy and complex regression model and then, you are going to add your error term (e) and then constant (C). So, that makes our regression model clear and you can see, in case of multiple regression analysis, we can take our independent variable as either a nominal variables or a metric variable. So, independent variable could be made metric or non-metric but our dependent variable, in case of multiple regression analysis or linear regression analysis or even hierarchical regression analysis, that will do later, should always be metric. So, this should always be metric while this independent variables, they could be either metric or non-metric.