What is R-squared and how does it help us?

365 Careers
A free video tutorial from 365 Careers
Creating opportunities for Business & Finance students
4.5 instructor rating • 57 courses • 969,020 students

Lecture description

What makes a good regression? That's what the R-squared tackles - the goodness of fit of a regression equation.

Learn more from the full course

Statistics for Data Science and Business Analysis

Statistics you need in the office: Descriptive & Inferential statistics, Hypothesis testing, Regression analysis

04:48:23 of on-demand video • Updated April 2020

  • Understand the fundamentals of statistics
  • Learn how to work with different types of data
  • How to plot different types of data
  • Calculate the measures of central tendency, asymmetry, and variability
  • Calculate correlation and covariance
  • Distinguish and work with different types of distributions
  • Estimate confidence intervals
  • Perform hypothesis testing
  • Make data driven decisions
  • Understand the mechanics of regression analysis
  • Carry out regression analysis
  • Use and understand dummy variables
  • Understand the concepts needed for data science even with Python and R!
English [Auto] So far we decompose the total variability of the observed data into explained and unexplained. We've also noted the smaller the regression error the better the regression. But this is statistics. So there must be at least one widely used measure that describes how powerful the regression is right. Well fortunately or unfortunately depending on your attitude there may be. Let me introduce you to the R-squared the R-Squared is an intuitive and practical tool when in the right hands it is equal to variability explained by the regression divided by total variability. So what does it mean. It is a relative measure and takes values ranging from 0 to 1 and R squared of 0 means your regression line explains none of the variability of the data and R-squared of one would mean your model explains the entire variability of the data. Unfortunately regressions explaining the entire variability are rare. What you will usually observe is values ranging from 0.2 to zero point nine The immediate question. Any student is compelled to ask is what is a good R-squared. When do I know for sure my regression is good enough. I regret to inform you there is no definite answer to that in fields such as physics and chemistry. Scientists are usually looking for regressions with R-squared between 0.7 and 0.9 9. However in social sciences such as economics finance and psychology and R-squared of 0.2 or 20 percent of the variability explained by the model could be fantastic. It depends on the complexity of the topic and how many variables are believed to be in play. Think about income once more. It may depend on your household income including your parents and spouse your education years of experience country you were living in languages you speak and this may still work out for less than 50 percent of the variability of income. Your salary is a very complex issue but you probably know that. Let's check out our S.A.T. GPA example we said the S.A.T. score is one of the better determinants of intellectual capacity and capability. The truth is that our regression had an R-squared of zero point for 0 6 or in other words S.A.T. scores explained 41 percent of the variability of the college grades for our sample in our squared of 41 percent is neither good nor bad. But since it is far away from 90 percent we may conclude we are missing some important information. Other determinants must be considered variables such as gender income and marital status could help us understand the full picture a little better. OK should we move on. Wait wait what did I say in this section. Don't jump into regressing. Critical thinking is crucial before agreeing that a factor is significant. You should try to understand why. So let's quickly justify that claim. First women are more likely to outperform men in high school but then in higher education more men enter academia. There are many biases in place here without telling you if female or male candidates are better. Scientific research shows that a gender gap exists in education. Gender is an important input for any regression on the topic. The second factor we pointed out is income in your household income is low you are more likely to get a part time job thus still have less time for studying and probably get lower grades. If you've ever been to college you will surely remember a friend who underperform because of this reason. Third if you get married and have a child you'll definitely have a lower attendance. Contrary to what most students think when in college attendance is a significant factor for your GPA. You may think your time is better spent when skipping a lecture but your GPA begs to differ. Right after these clarifications. Let's find the bottom line. The R squared measure is the goodness of fit of your model. The more factors you include in your regression the higher the r squared. So should we include gender and income in our regression. If this is in line with our research and their inclusion results in a better model we should do that. However we'll talk about regressions with more variables later in the course. In this lesson we built a solid understanding of how our squared functions. Excellent. See you next time when we will explore the multiple linear regression.