Linear regression in R for Data Scientists
3.8 (5 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
104 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Linear regression in R for Data Scientists to your Wishlist.

Add to Wishlist

Linear regression in R for Data Scientists

Learn the most important technique in Analytics with lots of business examples. From basic to advanced.
3.8 (5 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
104 students enrolled
Created by Francisco Juretig
Last updated 1/2016
Current price: $10 Original price: $20 Discount: 50% off
5 hours left at this price!
30-Day Money-Back Guarantee
  • 7 hours on-demand video
  • 1 Article
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Model basic and complex real world problem using linear regression
  • Understand when models are performing poorly and correct it
  • Design complex models for hierarchical data
  • How to properly prepare the data for linear regression
  • When linear regression is not sufficient
  • Understand how to interpret the results and translate them to actionable insights
View Curriculum
  • Ideally some basic statistics and R, though neither is strictly necessary
  • Some previous experience manipulating Excel files

When buying any of my courses, I also give you free coupons to the rest of my courses. Just send me a message after enrolling. Pay one course, get 5!!

Linear regression is the primary workhorse in statistics and data science. Its high degree of flexibility allows it to model very different problems. We will review the theory, and we will concentrate on the R applications using real world data (R is a free statistical software used heavily in the industry and academia). We will understand how to build a real model, how to interpret it, and the computational technical details behind it. The goal is to provide the student the computational knowledge necessary to work in the industry, and do applied research, using lineal modelling techniques. Some basic knowledge in statistics and R is recommended, but not necessary. The course complexity increases as it progresses: we review basic R and statistics concepts, we then transition into the linear model explaining the computational, mathematical and R methods available. We then move into much more advanced models: dealing with multilevel hierarchical models, and we finally concentrate on nonlinear regression. We also leverage several of the latest R packages, and latest research. We focus on typical business situations you will face as a data scientist/statistical analyst, and we provide many of the typical questions you will face interviewing for a job position. The course has lots of code examples, real datasets, quizzes, and video. The video duration is 4 hours, but the user is expected to take at least 5 extra hours working on the examples, data , and code provided. After completing this course, the user is expected to be fully proficient with these techniques in an industry/business context. All code and data available at Github.

Who is the target audience?
  • People pursuing a career in Data Science
  • Statisticians needing more practical/computational experience
  • Data modellers
  • People pursuing a career in practical Machine Learning
Compare to Other Regression Analysis Courses
Curriculum For This Course
30 Lectures
6 Lectures 37:34

Quick intro. Brief overview. What you will learn, and what you should learn before taking this course

Preview 02:25

Use the attached link resource for all the code/data used in this course.

Getting the data/code for this course

A more complete overview of this course.

What is linear regression, and what is this course about?

Advantages of R. Why it is the main statistical software nowadays? What are the advantages and disadvantages?

Why R?

Basic concepts in R. Installing packages. Vectors. Matrices. Working with dataframes and dates. Basic mathematical operations

Setting up R. Understanding the basics

Working with read.csv(). How to load csv files. We will review the basic data-processing techniques we will use in this course

Preparing the data in R
Linear regression: Ordinary Least Squares
16 Lectures 04:05:21

A quick overview of what are we doing when using OLS and the lm() function in R. Projection matrices. Residuals. Geometrical interpretation. Formulas for coefficients.

Mathematical preliminaries (OPTIONAL)

Running our first example in R. Using the lm() function. How to interpret the coefficients, pvalues, ANOVA. F statistics

A first example in R

The equivalence between doing ML and OLS. Why these estimates are equal? We review an example done via the optim() function in R. Minimizing the sum of squares numerically

Preview 20:47

Linear regression assumptions
6 questions

How to interpret pvalues in the context of linear regression.


When are our pvalues contaminated? How can we avoid this?

PValue hacking

A much more complex example of OLS.

A more realistic example

How to choose the best model? Why do we want models with few variables? How can we use the stepAIC() function and the AIC() function. What happens when we remove variables from our model?

Model Selection

From the datasets folder. Open the CO2-Emissions.csv data. This data was obtained from the World Bank. The objective is to predict what are would be the CO2 Emissions in India. The data is from 1961

Practical Quiz - Explaing Co2 Emissions
10 questions

Choosing the best model
5 questions

We need our residuals to verify our OLS assumptions. How can we check: normality, homocedasticity, non-autocorrelation. How to read the qqplot() and some normality tests.

Residuals and plots

The relationship between leverage, outliers, influence. How to use CookD statistic? How can we read the last chart that lm() produces?

Influence plots and outlier detection

When do we need models with variables in logs()? Log-log models.

Log transformations - Price Elasticities

3 questions

Models with lots of variables might end up adjusting not to the true response, but to "noise". We use the DAAG package for cross-validation mean square error.


Using the predict() function in R. The difference between confidence intervals and prediction intervals. The difference between the variances of both predictions


The consequences of multicollinearity. How can we detect it, and what are the options to deal with it? VIFs, and condition indexes


The problem of non constant variance. How can we identify it using the R plots. Using robust sandwich matrices via the sandwich() package

Heteroscedasticity, and how to solve it

Detecting auto-correlation. Using the robust HAC matrix from the sandwich package. The ACF() function

Autocorrelation, and how to solve it

Linear Regression
8 questions

Monte Carlo in Excel. Monte Carlo simulation in R creating our own function. Creating synthetic datasets

Preview 18:44

Monte Carlo
3 questions
Linear regression: Mixed Effects Regression
7 Lectures 02:08:33

An introduction to mixed models. What are the conceptual differences between mixed models, and fixed effects models. Simulating datasets with random effects via Monte Carlo in R.

Hierarchical Models and linear regression

The possible definitions about random effects: A) the effects we don't care about B) the effects we can treat as becoming from an infinite population C) the effects not estimated by least squares D) the unobserved effects that change through time

Random effects - A philosophical discussion

Simulating random effects for the intercept and the independent variables. The different slopes per group, and what is the interpretation

Preview 15:58

We create our own function for maximizing the log-likelihood, and we compare this to lmer().

Preview 19:39

Mixed effects linear regression
7 questions

How to analyse the residuals from an lmer() object? Using the plots that R produces

Model results and residuals

Random vs Fixed effects
2 questions

Nested effects, crossed effects. The different operators we can use in lmer(). The different ways of defining the random effects

Advanced random effects modelling and nested effects

The problem of multiple comparisons. Using the lmertest() package. Checking for significative differences. Comparing different levels of our categorical variables.

Multiple Comparisons
Robust linear regression
1 Lecture 10:56

Why do outliers bring problems? What is an outlier? How can we detect them? The rlm() and lmrob() functions

The rlm() and the lmRob() functions
About the Instructor
Francisco Juretig
3.9 Average rating
154 Reviews
1,348 Students
8 Courses

I worked for 7+ years exp as statistical programmer in the industry. Expert in programming, statistics, data science, statistical algorithms. I have wide experience in many programming languages. Regular contributor to the R community, with 3 published packages. I also am expert SAS programmer. Contributor to scientific statistical journals. Latest publication on the Journal of Statistical Software.