
Explore the theory and application of linear regression with Stata, focusing on concepts, reading outputs, and working with practical datasets, plus an accompanying e-book and slides.
Explore simple linear regression with GPA as the dependent variable and attendance as the independent variable, using a scatter plot and best fit line y = a x + b.
Explain how r-squared measures the proportion of GPA variation explained by attendance, showing that a good fit yields r-squared near 1, while a poor fit nears 0.
Explore the p-value and statistical significance in linear regression, using a tea-guessing analogy to show how a p-value below 0.05 leads to rejecting luck and validating the slope.
Use a linear regression model to predict GPA from attendance, assess accuracy with r-squared and p-value, and interpret residuals as actual minus predicted GPA.
Explore how multiple linear regression uses attendance and study to predict GPA, interpret coefficients, and compare effects, showing attendance has a larger impact than study.
Explore how p-values indicate statistical significance in a multiple linear regression, with each coefficient's p-value below 0.05 signaling significance and justifying including attendance and study in the model.
Assess fit and residuals in a regression predicting GPA from attendance and study in Stata, noting r-squared and p-values; interpret residuals as observed minus predicted and aim for small residuals.
Learn how to encode a binary variable, like gender, as 0/1 for linear regression, interpret coefficients, and compare reference categories in a multiple regression.
Learn how to code categorical variables in linear regression using dummy variables, with a four-group major example, where coefficients compare each group to the reference (business).
After fitting the linear regression model, assess fit and assumptions by comparing predicted to observed GPA and visualize fitted versus observed values to see alignment near the y=x line.
Examine the normality of residuals in linear regression by plotting a histogram and overlaying a normal curve, and assess independence and constant variance to validate model assumptions.
Test independence by plotting residuals against fitted values or independent variables to check for patterns. A random scatter indicates independence; a pattern such as a U shape signals dependence.
Identify and diagnose outliers in linear regression using diagnostics to assess model fit, understand influential observations, and compare regression lines with and without outliers.
Explore how to apply linear regression with Stata, read output, and perform diagnostics and assumption checks using intuitive commands.
Explore simple linear regression with Stata by loading a dataset, plotting a scatter plot of GPA against attendance, and interpreting the regression output: GPA equals 1.22 times attendance minus 13.2.
Assess model fit by comparing simple and multiple regression using adjusted r-squared and p-values, then predict GPA and examine residuals with histograms and q-q plots for normality.
Explore how a quadratic term for English improves GPA regression in Stata, using either English squared or an English-by-English interaction, indicating a nonlinear relationship with significant results (p<0.05).
Learn how to assess model fit and assumptions in a multiple linear regression predicting GPA from attendance and gender, using predicted vs observed plots, adjusted r-squared, and p-values.
Assess independence of residuals and constant variance in linear regression by plotting residuals against fitted values, using a zero line, and applying Stata's robust standard errors when needed.
Run a regression in Stata, then use the vif command to compute VIF values for each independent variable. All VIFs are below 10, indicating no multicollinearity.
Identify and diagnose outliers in simple and multiple linear regression using diagnostic plots, av plots and leverage-residuals plots, and assess their influence on model results.
Identify and assess influential observations using Cook's distance and dfits in Stata; create new influence variables, predict fits, and visualize influential points to decide whether to exclude problematic cases.
Explore selection algorithms in Stata: forward selection and backward elimination identify best predictors for GPA. Stepwise with p<0.05 selects English and attendance as the final model, aligning with dataset insights.
Learn to visualize regression results in Stata using margins plot and margins after analysis, and customize graphs that clearly summarize each independent variable's effect on the dependent variable.
Explore how to visualize effects in multiple linear regression using Stata by applying the margins and margins plot commands to predict GPA across attendance values 40 to 100.
Explore a multiple linear regression with attendance and a gender indicator, using margins and margin plots to compare GPA by gender, showing females score higher than males across attendance levels.
Explore a dataset with more than 800 observations to apply multiple linear regression in Stata, including scatter plots, predicting gpa with residuals, model comparison by adjusted r-squared, and testing assumptions.
Explore a large Stata dataset to model GPA with linear regression, examining variables like college, credits, gender, attendance, English, siblings, income, and work, and test model fit and assumptions.
Review the core theory, learn to analyze data sets and statistics, and apply a solo project in linear regression using Stata.
Included in this course is an e-book and a set of slides. The course is divided into two parts. In the first part, students are introduced to the theory behind linear regression. The theory is explained in an intuitive way. No math is involved other than a few equations in which addition and subtraction are used. The purpose of this part of the course is for students to understand what linear regression is and when it is used. Students will learn the differences between simple linear regression and multiple linear regression. They will be able to understand the output of linear regression, test model accuracy and assumptions. Students will also learn how to include different types of variables in the model, such as categorical variables and quadratic variables. All this theory is explained in the slides, which are made available to the students, as well as in the e-book that is freely available for students who enroll in the course.
In the second part of the course, students will learn how to apply what they learned using Stata. In this part, students will use Stata to fit multiple regression models, produce graphs that describe model fit and assumptions, and to use variable specific commands that will make the output more readable. This part assumed very basic knowledge of Stata.