Find online courses made by experts from around the world.
Take your courses with you and learn anywhere, anytime.
Learn and practice realworld skills and achieve your goals.
Linear regression is the primary workhorse in statistics and data science. Its high degree of flexibility allows it to model very different problems. We will review the theory, and we will concentrate on the R applications using real world data (R is a free statistical software used heavily in the industry and academia). We will understand how to build a real model, how to interpret it, and the computational technical details behind it. The goal is to provide the student the computational knowledge necessary to work in the industry, and do applied research, using lineal modelling techniques. Some basic knowledge in statistics and R is recommended, but not necessary. The course complexity increases as it progresses: we review basic R and statistics concepts, we then transition into the linear model explaining the computational, mathematical and R methods available. We then move into much more advanced models: dealing with multilevel hierarchical models, and we finally concentrate on nonlinear regression. We also leverage several of the latest R packages, and latest research. We focus on typical business situations you will face as a data scientist/statistical analyst, and we provide many of the typical questions you will face interviewing for a job position. The course has lots of code examples, real datasets, quizzes, and video. The video duration is 4 hours, but the user is expected to take at least 5 extra hours working on the examples, data , and code provided. After completing this course, the user is expected to be fully proficient with these techniques in an industry/business context. All code and data available at Github.
Not for you? No problem.
30 day money back guarantee.
Forever yours.
Lifetime access.
Learn on the go.
Desktop, iOS and Android.
Get rewarded.
Certificate of completion.
Section 1: Introduction  

Lecture 1  02:25  
Quick intro. Brief overview. What you will learn, and what you should learn before taking this course 

Lecture 2  00:04  
Use the attached link resource for all the code/data used in this course. 

Lecture 3  06:11  
A more complete overview of this course. 

Lecture 4  01:37  
Advantages of R. Why it is the main statistical software nowadays? What are the advantages and disadvantages? 

Lecture 5  07:36  
Basic concepts in R. Installing packages. Vectors. Matrices. Working with dataframes and dates. Basic mathematical operations 

Lecture 6  19:41  
Working with read.csv(). How to load csv files. We will review the basic dataprocessing techniques we will use in this course 

Section 2: Linear regression: Ordinary Least Squares  
Lecture 7  20:00  
A quick overview of what are we doing when using OLS and the lm() function in R. Projection matrices. Residuals. Geometrical interpretation. Formulas for coefficients. 

Lecture 8  17:49  
Running our first example in R. Using the lm() function. How to interpret the coefficients, pvalues, ANOVA. F statistics 

Lecture 9  20:47  
The equivalence between doing ML and OLS. Why these estimates are equal? We review an example done via the optim() function in R. Minimizing the sum of squares numerically 

Quiz 1 
Linear regression assumptions

6 questions  
Lecture 10  17:32  
How to interpret pvalues in the context of linear regression. 

Lecture 11  06:55  
When are our pvalues contaminated? How can we avoid this? 

Lecture 12  19:49  
A much more complex example of OLS. 

Lecture 13  19:20  
How to choose the best model? Why do we want models with few variables? How can we use the stepAIC() function and the AIC() function. What happens when we remove variables from our model? 

Quiz 2  10 questions  
From the datasets folder. Open the CO2Emissions.csv data. This data was obtained from the World Bank. The objective is to predict what are would be the CO2 Emissions in India. The data is from 1961 

Quiz 3 
Choosing the best model

5 questions  
Lecture 14  14:09  
We need our residuals to verify our OLS assumptions. How can we check: normality, homocedasticity, nonautocorrelation. How to read the qqplot() and some normality tests. 

Lecture 15  19:17  
The relationship between leverage, outliers, influence. How to use CookD statistic? How can we read the last chart that lm() produces? 

Lecture 16  05:05  
When do we need models with variables in logs()? Loglog models. 

Quiz 4 
Residuals

3 questions  
Lecture 17  13:58  
Models with lots of variables might end up adjusting not to the true response, but to "noise". We use the DAAG package for crossvalidation mean square error. 

Lecture 18  04:38  
Using the predict() function in R. The difference between confidence intervals and prediction intervals. The difference between the variances of both predictions 

Lecture 19  18:46  
The consequences of multicollinearity. How can we detect it, and what are the options to deal with it? VIFs, and condition indexes 

Lecture 20  17:10  
The problem of non constant variance. How can we identify it using the R plots. Using robust sandwich matrices via the sandwich() package 

Lecture 21  11:22  
Detecting autocorrelation. Using the robust HAC matrix from the sandwich package. The ACF() function 

Quiz 5 
Linear Regression

8 questions  
Lecture 22  18:44  
Monte Carlo in Excel. Monte Carlo simulation in R creating our own function. Creating synthetic datasets 

Quiz 6 
Monte Carlo

3 questions  
Section 3: Linear regression: Mixed Effects Regression  
Lecture 23  19:58  
An introduction to mixed models. What are the conceptual differences between mixed models, and fixed effects models. Simulating datasets with random effects via Monte Carlo in R. 

Lecture 24  14:02  
The possible definitions about random effects: A) the effects we don't care about B) the effects we can treat as becoming from an infinite population C) the effects not estimated by least squares D) the unobserved effects that change through time 

Lecture 25  15:58  
Simulating random effects for the intercept and the independent variables. The different slopes per group, and what is the interpretation 

Lecture 26  19:39  
We create our own function for maximizing the loglikelihood, and we compare this to lmer(). 

Quiz 7 
Mixed effects linear regression

7 questions  
Lecture 27  19:30  
How to analyse the residuals from an lmer() object? Using the plots that R produces 

Quiz 8 
Random vs Fixed effects

2 questions  
Lecture 28  19:27  
Nested effects, crossed effects. The different operators we can use in lmer(). The different ways of defining the random effects 

Lecture 29  19:59  
The problem of multiple comparisons. Using the lmertest() package. Checking for significative differences. Comparing different levels of our categorical variables. 

Section 4: Robust linear regression  
Lecture 30  10:56  
Why do outliers bring problems? What is an outlier? How can we detect them? The rlm() and lmrob() functions 
I worked for 7+ years exp as statistical programmer in the industry. Expert in programming, statistics, data science, statistical algorithms. I have wide experience in many programming languages. Regular contributor to the R community, with 3 published packages. I also am expert SAS programmer. Contributor to scientific statistical journals. Latest publication on the Journal of Statistical Software.