Advanced analysis of outliers in R and Matlab
4.5 (19 ratings)
Course Ratings are calculated from individual studentsโ€™ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
6,874 students enrolled

Advanced analysis of outliers in R and Matlab

Learn robust data analysis with R and Matlab, the key in Data Mining, Statistics, and Machine Learning.
4.5 (19 ratings)
Course Ratings are calculated from individual studentsโ€™ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
6,874 students enrolled
Last updated 6/2020
English
English [Auto]
Current price: $139.99 Original price: $199.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 9 hours on-demand video
  • 14 articles
  • 15 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • โœ All concepts related to Outliers and Robust Statistics
  • ๐Ÿ’ป Practical examples in R and Matlab, step by step
  • ๐Ÿ“š The Free Book of Outliers with tips and tricks
  • ๐Ÿค“ What methods you should use in practice
  • ๐ŸŒŽ Real dataset examples
  • ๐Ÿงง Valid official certificate
  • ๐Ÿ™Œ The course is updated every month
  • ๐ŸŽ Gift when finished
Requirements
  • Basic statistical knowledge.
Description

Robust data analysis and outlier detection are crucial in Statistics, Data Analysis, Data Mining, Machine Learning, Artificial Intelligence, Pattern Recognition, Classification, Principal Components, Regression, Big Data, and any field related to the data.


With the course you will obtain the FREE BOOK ABOUT OUTLIERS with specific tips and tricks, and the summary of all the robust methods to detect them that will help you obtain accurate results and awsome data analysis.


Researchers, students, data analysts, and mostly anyone dealing with real data, should be aware of the problem with outliers (and outliers) and should know how to deal with this problem and what robust methods should be used. . The vast majority of Machine Learning algorithms are capable of detecting characteristics common to the majority of data, but many times they are confused or even ignore those atypical data, which should not be ignored in conditions where the security of people, such as the analysis of medical data, the world of the Internet of Things IoT, or risks and security in companies.


  • What would happen if a virus spread throughout the world because we ignored anomalous data? We would have a pandemic, like that of COVID19, which if the outlier signals detected by neural networks had not been ignored, we could have acted upon beforehand.


  • What would happen if we ignored any signal from a Smart City system? We could miss a gas leak.


  • What would happen if by ignoring an alarm, we miss a meteorite coming towards the earth? We would have to call Bruce Willis, to save us from Armageddon.


With this course you will be an expert in robust data analysis, in the detection and treatment of atypical data, both learning the theoretical concepts, and having at your disposal the algorithms implemented in a practical way with two different languages โ€‹โ€‹so that you can choose the one that best suits you: R-Studio and Matlab.


You will also have access to a community for questions, where all the students are and you can ask what you want about the analysis of outliers.


The example implementation codes are available to you in the open Github repository for you to download and use.


In addition, we have two sections of basic concepts that will help you to remember some notions necessary to understand atypical detection methods.


With this course you will be able to understand and know how to deal with one of the most important topics of today both academically, in the industry and in data analysis or machine learning. The examples will help you to visualize the importance of the analysis of outliers as well as a guide to carry out these analyzes yourself.

Who this course is for:
  • Data scientist.
  • Data analyst.
  • Students.
  • Researchers.
  • Engineers.
Course content
Expand all 74 lectures 09:12:37
+ โ€“ Introduction
6 lectures 13:02
Pre-requirements
02:44
Software: R and Matlab
01:07
Code repository on Github
00:18
Evaluation
02:23
+ โ€“ Examples
7 lectures 19:12
Introduction
00:08

In this class we will see an introduction to the problem of the presence of outliers in our data, and with an example we will see the importance of this problem. It is an esencial issue in Data Mining, Data Analysis, Pattern recognition, Machine learning, and we have to be able to understand the problem and the basics of the methods to deal with it, before we use the softwares like R or Matlab.

Preview 04:50

Lesson where we see the Matlab code.

Matlab: Datos peso del cuerpo y cerebro de mamรญferos
05:06

Lesson where we see the R code.

R: Data of brain and body weight
02:42

We will know what an outlier (or atypical data) really is, how they can arise, and with a simple example we will see how the presence of outliers can affect the statistical analysis.

What is an outlier?
04:48

Lesson where we see the Matlab code.

Matlab: Simple example of outliers
00:53

Lesson where we see the R code.

R: Simple example of outliers
00:45
+ โ€“ Basic concepts I
6 lectures 01:00:01
Introduction
00:13

Let's review the concept of sample and population, and introduce the notion of random variable. We will see some simple examples.

Sample and Population
09:31

Let's see what the distribution of a random variable is, and understand it using an example.

Distribution of a random variable
23:05

We will know the Normal distribution, the most used and known distribution in Statistics.

Normal distribution
10:14

We will introduce the student-t and the chi-square distributions that arise from the Normal.

Student-t and chi-square distributions
04:42

We are going to see what a sample estimator is, what is an estimation, and the properties that the estimators need to have in order to provide good estimates of the unknown population parameters. We will see the best known sample estimators.

Estimators
12:16
+ โ€“ Univariate space
13 lectures 01:31:59
Introduction
00:06

Before starting to see the methods to detect outliers in the univariate space, we must know important concepts that we will use throughout the course, such as the central tendency estimators: mean and median.

Mean vs Median
06:30

We will see the spread (or variation) estimators: Range and Standard deviation and their respective robust versions: Interquartile range and MAD.

Range vs IR and STD vs MAD
14:07

We will see the estimators of shape: the classical skewness and the Medcouple.

Skewness vs Medcouple
11:36

Lesson where we see the R code for the examples with the sample estimators and their robust versions.

R: Estimators and robust versions
05:09

We will know the basics of the SD Method, a classic to detect outliers in a random variable.

Method SD
08:46

We will know the basics of the Z score method, antoher classic approach to detect outliers in a random variable.

Z score
05:34

We will know the basics of the Tukey Boxplot method, antoher classic approach to detect outliers in a random variable.

Tukey boxplot
12:39

We will know the basis of the MADe method, a robust method to detect outliers in a random variable.

MADe
03:18

We will know the basis of the modified Z-score method, a robust method to detect outliers in a random variable.

Modified Z score
03:30

We will know the basis of the adjusted Boxplot method, a robust method to detect outliers in a random variable.

Adjusted boxplot
05:10

Lesson where we see the Matlab and the R codes for the methods for outlier detection in the univariate space, and their robust versions.ย 

Matlab + R: Methods for outlier detection (univariate)
08:42

We will came out to some conclusions about the methods to detect outliers in univariate space, and summarize everything we saw in this section.

Summary
06:52
+ โ€“ Basic concepts II
7 lectures 01:02:19
Introduction
00:13

We are going to look at some necessary linear algebra concepts, which are the definition of a vector, a matrix, the transpose of a vector or a matrix, the identity matrix, the inverse of a matrix, the product between a vector and a matrix, the product between two matrices.

Linear algebra
11:03

We will see how a multivariate random variable is defined and we will study some examples.

Multivariate random variable
04:13

We will define the joint distribution and density function of a multidimensional random vector, as well as marginal distributions and densities.

Joint and marginals distributions
04:55

We will see the concept of covariance, correlation and independence between two random variables of a random vector, linked to the notion of joint distribution.

Independence, covariance and correlation
08:48

We will see some functions of R that allows us to draw the distribution of a bivariate Normal, and see how changing the parameters affects the function.

R: Bivariate Normal
13:07
+ โ€“ Multivariate space
24 lectures 04:18:25
Introduction
00:07

We will see how outliers are defined in a multivariate space, and why the methods of the univariate space cannot be used to detect them in the multivariate case.

Multivariate space
05:51

We will see an example of real data in the multivariate space, specifically simple linear regression, where the presence of outliers influences the results.

Matlab: Example
13:11

We will learn the concept of location in the multivariate space and what are the estimators.

Location estimators
14:20

We will see in Matlab how to calculate multivariate location estimators and a graphical example.

Matlab: Multivariate location estimators
04:17

We will see in R how to calculate multivariate location estimators and a graphical example.

R: Multivariate location estimators
06:15
R excercise
1 question

We will learn the idea of dispersion in the multivariate space and what are the estimators.

Dispersion estimators
33:52

We will see the multivariate dispersion estimators with an example in R.

R: Multivariate dispersion estimators
04:23

We will learn about the Euclidean distance that allows to sort the data in a space of more than one dimension.

Euclidean distance
14:34

We will learn about the Mahalanobis distance that allows to sort the data in a space of more than one dimension, taking into account the relation between the variables. The classic version is sensitive to outliers, so you have to use the robust version of the distance.

Mahalanobis distance
17:04

We will learn how to calculate the Mahalanobis distance in R.

R: Mahalanobis distance
09:50

We will study the MCD method for the calculation of robust location and dispersion estimators.

MCD
16:58

How to obtain the MCD estimator in Matlab.

Matlab: MCD
05:43

How to obtain the MCD estimator in R.

R: MCD
03:20

We will see another way to detect outliers with the robust Mahalanobis distance based on the MCD, considering another cut-off value, the adjusted quantile.

Adjusted MCD
21:20
R package "mvoutlier"
00:11

We will study some real data, the Kola project, and use the robust Mahalanobis distance based on the MCD with the adjusted quantile.

Example: Kola project
13:53

We will study some real data, the Kola project, and use the robust Mahalanobis distance based on the MCD with the adjusted quantile, in R.

R: Kola project and the adjusted MCD
09:51

We will know another method of robust estimation but based on projections.

Stahel-Donoho
13:00

We will know the code for Stahel-Donoho estimator in R.

R: Stahel-Donoho
08:52

We will see the Kurtosis method, based on projections of the data relative to the kurtosis coefficient value.

Kurtosis
04:40

We will see in Matlab the application of the learned multivariate outlier detection methods.

Matlab: Outlier detection methods (multivariate)
18:49

We will see in R the application of the learned multivariate outlier detection methods.

R: Outlier detection methods (multivariate)
10:11

We will summarize what we have seen in this section and come to some conclusions.

Conclusions: multivariate space
07:53
+ โ€“ Linear regression
6 lectures 46:32
Introduction
00:05

Let's see what the linear regression problem consists of, how it can be expressed in a matrix form and what are the assumptions of the model.

Linear regression
19:45

We will see the classic method of estimating the parameters of the multidimensional linear regression model. And we will see with an example that this method is not robust to outliers.

Classic regression: Ordinary least squares
11:59

We will know the robust methods in the regression analysis and the types of outliers that we can find.

Robust regression methods: LAD, LMS y LTS
04:49

We will see the codes of the robust methods to estimate the regression model with two datasets, in Matlab.

Matlab: Robust regression
04:00

We will see the codes of the robust methods to estimate the regression model with two datasets, in R.

R: Robust regression
05:54
+ โ€“ BONUS
4 lectures 01:02
Difference between R and Matlab
00:03
Scientific articles and code packages in R y Matlab
00:05
More supplementary material
00:04
Here is your gift
00:49