What is R-squared and how does it help us?

A free video tutorial from 365 Careers
Creating opportunities for Data Science and Finance students
Rating: 4.5 out of 5Instructor rating
109 courses
2,868,031 students
What is R-squared and how does it help us?

Lecture description

What makes a good regression? That's what the R-squared tackles - the goodness of fit of a regression equation.

Learn more from the full course

Statistics for Data Science and Business Analysis

Statistics you need in the office: Descriptive & Inferential statistics, Hypothesis testing, Regression analysis

04:48:26 of on-demand video • Updated February 2024

Understand the fundamentals of statistics
Learn how to work with different types of data
How to plot different types of data
Calculate the measures of central tendency, asymmetry, and variability
Calculate correlation and covariance
Distinguish and work with different types of distributions
Estimate confidence intervals
Perform hypothesis testing
Make data driven decisions
Understand the mechanics of regression analysis
Carry out regression analysis
Use and understand dummy variables
Understand the concepts needed for data science even with Python and R!
English [CC]
Lecturer: So far, we decomposed the total variability of the observed data into explained and unexplained. We've also noted the smaller the regression error, the better the regression. But this is statistics, so there must be at least one widely used measure that describes how powerful a regression is, right? Well, fortunately or unfortunately, depending on your attitude, there may be. Let me introduce you to the R-squared. The R-squared is an intuitive and practical tool, when in the right hands. It is equal to variability explained by the regression divided by total variability. So what does it mean? It is a relative measure and takes values ranging from 0 to 1. An R-squared of 0 means your regression line explains none of the variability of the data. An R-squared of 1 would mean your model explains the entire variability of the data. Unfortunately, regressions explaining the entire variability are rare. What you will usually observe is values ranging from 0.2 to 0.9. The immediate question any student is compelled to ask is, what is a good R-squared? When do I know for sure my regression is good enough? I regret to inform you there is no definite answer to that. In fields such as physics and chemistry, scientists are usually looking for regressions with R-squared between 0.7 and 0.99. However, in social sciences such as economics, finance and psychology, an R-squared of 0.2, or 20%, of the variability explained by the model could be fantastic. It depends on the complexity of the topic and how many variables are believed to be in play. Think about income once more. It may depend on your household income, including your parents and spouse, your education, years of experience, country you are living in, languages you speak, and this may still account for less than 50% of the variability of income. Your salary is a very complex issue, but you probably know that. Let's check out our SAT, GPA example. We said the SAT score is one of the better determinants of intellectual capacity and capability. The truth is that our regression had an R-squared of 0.406, or in other words, SAT scores explained 41% of the variability of the college grades for our sample. And R-squared of 41% is neither good nor bad, but since it is far away from 90%, we may conclude we are missing some important information. Other determinants must be considered. Variables such as gender, income, and marital status could help us understand the full picture a little better. Okay, should we move on? Wait, wait, what did I say in this section? Don't jump into regressing. Critical thinking is crucial. Before agreeing that a factor is significant, you should try to understand why. So let's quickly justify that claim. First, women are more likely to outperform men in high school, but then in higher education, more men enter academia. There are many biases in place here. Without telling you if female or male candidates are better, scientific research shows that a gender gap exists in education. Gender is an important input for any regression on the topic. The second factor we pointed out is income. If your household income is low, you are more likely to get a part-time job. Thus, you'll have less time for studying and probably get lower grades. If you've ever been to college, you will surely remember a friend who underperformed because of this reason. Third, if you get married and have a child, you'll definitely have a lower attendance. Contrary to what most students think when in college, attendance is a significant factor for your GPA. You may think your time is better spent when skipping a lecture, but your GPA begs to differ. Alright, after these clarifications let's find the bottom line. The R-squared measures the goodness of fit of your model. The more factors you include in your regression, the higher the R-squared. So should we include gender and income in our regression? If this is in line with our research and their inclusion results in a better model, we should do that. However, we'll talk about regressions with more variables later in the course. In this lesson, we built a solid understanding of how R-squared functions. Excellent. See you next time when we will explore the multiple linear regression.