Linear regression in R for Data Scientists
3.8 (4 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
80 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Linear regression in R for Data Scientists to your Wishlist.

Add to Wishlist

Linear regression in R for Data Scientists

Learn the most important technique in Analytics with lots of business examples. From basic to advanced.
3.8 (4 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
80 students enrolled
Last updated 1/2016
English
Current price: $10 Original price: $20 Discount: 50% off
1 day left at this price!
30-Day Money-Back Guarantee
Includes:
  • 7 hours on-demand video
  • 1 Article
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Have a coupon?
What Will I Learn?
Model basic and complex real world problem using linear regression
Understand when models are performing poorly and correct it
Design complex models for hierarchical data
How to properly prepare the data for linear regression
When linear regression is not sufficient
Understand how to interpret the results and translate them to actionable insights
View Curriculum
Requirements
  • Ideally some basic statistics and R, though neither is strictly necessary
  • Some previous experience manipulating Excel files
Description

When buying any of my courses, I also give you free coupons to the rest of my courses. Just send me a message after enrolling. Pay one course, get 5!!

Linear regression is the primary workhorse in statistics and data science. Its high degree of flexibility allows it to model very different problems. We will review the theory, and we will concentrate on the R applications using real world data (R is a free statistical software used heavily in the industry and academia). We will understand how to build a real model, how to interpret it, and the computational technical details behind it. The goal is to provide the student the computational knowledge necessary to work in the industry, and do applied research, using lineal modelling techniques. Some basic knowledge in statistics and R is recommended, but not necessary. The course complexity increases as it progresses: we review basic R and statistics concepts, we then transition into the linear model explaining the computational, mathematical and R methods available. We then move into much more advanced models: dealing with multilevel hierarchical models, and we finally concentrate on nonlinear regression. We also leverage several of the latest R packages, and latest research. We focus on typical business situations you will face as a data scientist/statistical analyst, and we provide many of the typical questions you will face interviewing for a job position. The course has lots of code examples, real datasets, quizzes, and video. The video duration is 4 hours, but the user is expected to take at least 5 extra hours working on the examples, data , and code provided. After completing this course, the user is expected to be fully proficient with these techniques in an industry/business context. All code and data available at Github.

Who is the target audience?
  • People pursuing a career in Data Science
  • Statisticians needing more practical/computational experience
  • Data modellers
  • People pursuing a career in practical Machine Learning
Students Who Viewed This Course Also Viewed
Curriculum For This Course
Expand All 30 Lectures Collapse All 30 Lectures 07:02:24
+
Introduction
6 Lectures 37:34

Quick intro. Brief overview. What you will learn, and what you should learn before taking this course

Preview 02:25

Use the attached link resource for all the code/data used in this course.

Getting the data/code for this course
00:04

A more complete overview of this course.

What is linear regression, and what is this course about?
06:11

Advantages of R. Why it is the main statistical software nowadays? What are the advantages and disadvantages?

Why R?
01:37

Basic concepts in R. Installing packages. Vectors. Matrices. Working with dataframes and dates. Basic mathematical operations

Setting up R. Understanding the basics
07:36

Working with read.csv(). How to load csv files. We will review the basic data-processing techniques we will use in this course

Preparing the data in R
19:41
+
Linear regression: Ordinary Least Squares
16 Lectures 04:05:21

A quick overview of what are we doing when using OLS and the lm() function in R. Projection matrices. Residuals. Geometrical interpretation. Formulas for coefficients.

Mathematical preliminaries (OPTIONAL)
20:00

Running our first example in R. Using the lm() function. How to interpret the coefficients, pvalues, ANOVA. F statistics

A first example in R
17:49

The equivalence between doing ML and OLS. Why these estimates are equal? We review an example done via the optim() function in R. Minimizing the sum of squares numerically

Preview 20:47

Linear regression assumptions
6 questions

How to interpret pvalues in the context of linear regression.

PValues
17:32

When are our pvalues contaminated? How can we avoid this?

PValue hacking
06:55

A much more complex example of OLS.

A more realistic example
19:49

How to choose the best model? Why do we want models with few variables? How can we use the stepAIC() function and the AIC() function. What happens when we remove variables from our model?

Model Selection
19:20

From the datasets folder. Open the CO2-Emissions.csv data. This data was obtained from the World Bank. The objective is to predict what are would be the CO2 Emissions in India. The data is from 1961

Practical Quiz - Explaing Co2 Emissions
10 questions

Choosing the best model
5 questions

We need our residuals to verify our OLS assumptions. How can we check: normality, homocedasticity, non-autocorrelation. How to read the qqplot() and some normality tests.

Residuals and plots
14:09

The relationship between leverage, outliers, influence. How to use CookD statistic? How can we read the last chart that lm() produces?

Influence plots and outlier detection
19:17

When do we need models with variables in logs()? Log-log models.

Log transformations - Price Elasticities
05:05

Residuals
3 questions

Models with lots of variables might end up adjusting not to the true response, but to "noise". We use the DAAG package for cross-validation mean square error.

Overfitiing
13:58

Using the predict() function in R. The difference between confidence intervals and prediction intervals. The difference between the variances of both predictions

Prediction
04:38

The consequences of multicollinearity. How can we detect it, and what are the options to deal with it? VIFs, and condition indexes

Multicollinearity
18:46

The problem of non constant variance. How can we identify it using the R plots. Using robust sandwich matrices via the sandwich() package

Heteroscedasticity, and how to solve it
17:10

Detecting auto-correlation. Using the robust HAC matrix from the sandwich package. The ACF() function

Autocorrelation, and how to solve it
11:22

Linear Regression
8 questions

Monte Carlo in Excel. Monte Carlo simulation in R creating our own function. Creating synthetic datasets

Preview 18:44

Monte Carlo
3 questions
+
Linear regression: Mixed Effects Regression
7 Lectures 02:08:33

An introduction to mixed models. What are the conceptual differences between mixed models, and fixed effects models. Simulating datasets with random effects via Monte Carlo in R.

Hierarchical Models and linear regression
19:58

The possible definitions about random effects: A) the effects we don't care about B) the effects we can treat as becoming from an infinite population C) the effects not estimated by least squares D) the unobserved effects that change through time

Random effects - A philosophical discussion
14:02

Simulating random effects for the intercept and the independent variables. The different slopes per group, and what is the interpretation

Preview 15:58

We create our own function for maximizing the log-likelihood, and we compare this to lmer().

Preview 19:39

Mixed effects linear regression
7 questions

How to analyse the residuals from an lmer() object? Using the plots that R produces

Model results and residuals
19:30

Random vs Fixed effects
2 questions

Nested effects, crossed effects. The different operators we can use in lmer(). The different ways of defining the random effects

Advanced random effects modelling and nested effects
19:27

The problem of multiple comparisons. Using the lmertest() package. Checking for significative differences. Comparing different levels of our categorical variables.

Multiple Comparisons
19:59
+
Robust linear regression
1 Lecture 10:56

Why do outliers bring problems? What is an outlier? How can we detect them? The rlm() and lmrob() functions

The rlm() and the lmRob() functions
10:56
About the Instructor
Francisco Juretig
4.6 Average rating
22 Reviews
331 Students
6 Courses

I worked for 7+ years exp as statistical programmer in the industry. Expert in programming, statistics, data science, statistical algorithms. I have wide experience in many programming languages. Regular contributor to the R community, with 3 published packages. I also am expert SAS programmer. Contributor to scientific statistical journals. Latest publication on the Journal of Statistical Software.