What is R-squared and how does it help us?

365 Careers
A free video tutorial from 365 Careers
Creating opportunities for Business & Finance students
4.6 instructor rating • 76 courses • 1,574,960 students

Lecture description

What makes a good regression? That's what the R-squared tackles - the goodness of fit of a regression equation.

Learn more from the full course

Statistics for Data Science and Business Analysis

Statistics you need in the office: Descriptive & Inferential statistics, Hypothesis testing, Regression analysis

04:48:26 of on-demand video • Updated January 2021

  • Understand the fundamentals of statistics
  • Learn how to work with different types of data
  • How to plot different types of data
  • Calculate the measures of central tendency, asymmetry, and variability
  • Calculate correlation and covariance
  • Distinguish and work with different types of distributions
  • Estimate confidence intervals
  • Perform hypothesis testing
  • Make data driven decisions
  • Understand the mechanics of regression analysis
  • Carry out regression analysis
  • Use and understand dummy variables
  • Understand the concepts needed for data science even with Python and R!
English [Auto] So far, we decompose the total variability of the observed data into explained and unexplained. We've also noted the smaller the regression error, the better the regression. But this is statistics. So there must be at least one widely used measure that describes how powerful a regression is. Right. Well, fortunately or unfortunately, depending on your attitude, there may be let me introduce you to the R squared. The R squared is an intuitive and practical tool when in the right hands, it is equal to variability explained by the regression divided by total variability. So what does it mean? It is a relative measure and takes values ranging from zero to one and R squared of zero means your regression line explains none of the variability of the data. And R-squared of one would mean your motto explains the entire variability of the data. Unfortunately, regressions explaining the entire variability are rare. What you will usually observe is values ranging from zero point two to zero point nine. The immediate question any student is compelled to ask is, what is a good R-squared, when do I know for sure my regression is good enough? I regret to inform you there is no definite answer to that. In fields such as physics and chemistry, scientists are usually looking for regressions with R-squared between zero point seven and zero point nine nine. However, in social sciences such as economics, finance and psychology and R-squared of zero point two or 20 percent of the variability explained by the model could be fantastic. It depends on the complexity of the topic and how many variables are believed to be in play. Think about income once more, it may depend on your household income, including your parents and spouse, your education, years of experience, country you were living in, languages you speak, and this may still account for less than 50 percent of the variability of income. Your salary is a very complex issue, but you probably know that. Let's check out our S.A.T. GPA example. We said the SAT score is one of the better determinants of intellectual capacity and capability. The truth is that our regression had an R-squared of zero point four zero six, or in other words, SAT scores explained forty one percent of the variability of the college grades for our sample. And R-squared of forty one percent is neither good nor bad, but since it is far away from 90 percent, we may conclude we are missing some important information. Other determinants must be considered variables such as gender, income and marital status could help us understand the full picture a little better. OK, should we move on? Wait, wait, what did I say in this section? Don't jump in to regressing. Critical thinking is crucial. Before agreeing that a factor is significant, you should try to understand why. So let's quickly justify that claim. First, women are more likely to outperform men in high school, but then in higher education, more men enter academia. There are many biases in place here without telling you if female or male candidates are better. Scientific research shows that a gender gap exists in education. Gender is an important input for any regression on the topic. The second factor we pointed out is income. If your household income is low, you are more likely to get a part time job. Thus you'll have less time for studying and probably get lower grades. If you've ever been to college, you will surely remember a friend who underperform because of this reason. Third, if you get married and have a child, you'll definitely have a lower attendance, contrary to what most students think when in college attendance is a significant factor for your GPA. You may think your time is better spent when skipping a lecture, but your GPA begs to differ. All right, after these clarifications, let's find the bottom line, the R-squared measures the goodness of fit of your model. The more factors you include in your regression, the higher the R squared. So should we include gender and income in our regression? If this is in line with our research and their inclusion results in a better model, we should do that. However, we'll talk about regressions with more variables later in the course. In this lesson, we built a solid understanding of how R squared functions. Excellent. See you next time when we will explore the multiple linear regression.