Reading data with dates: Classes for customized dates
The apply family of functions
Generating random numbers
Density and cumulative distribution function
sqldf - Part1
sqldf - Part2
The lm() function: Part1
The lm function: Part2
Normality, residuals and transformations
Linear mixed effects models: Part1
Linear mixed effects models: Part2
Logistic regression - Part 2
Logistic regression - Part 3
Logistic regression - Part 4
Poisson regression: Part1
Poisson regression: Part2
Poisson regression: Part3
How does it work? Relevant parameters - Part2
Using XGBoost for regression
Cross validation in XGBOOST: the xgb.cv function
Selecting PCA and projecting the data
Preprocessing data: Part1
Preprocessing data: Part2
Extracting meaningful sound features
Some R programming experience is ideal, but not strictly necessary
Some general knowledge on statistics is mandatory: What is a density function? What are random variables?
This course explores several modern machine learning and data science techniques in R. As you probably know, R is one of the most used tools among data scientists. We showcase a wide array of statistical and machine learning techniques. In particular:
Using R's statistical functions for drawing random numbers, calculating densities, histograms, etc.
Supervised ML problems using the CARET package
Data processing using sqldf, caret, etc.
Unsupervised techniques such as PCA, DBSCAN, K-means
Calling Deep Learning models in Keras(Python) from R
Use the powerful XGBOOST method for both regression and classification
Doing interesting plots, such as geo-heatmaps and interactive plots
Train ML train hyperparameters for several ML methods using caret
Do linear regression in R, build log-log models, and do ANOVA analysis
Estimate mixed effects models to explicitly model the covariances between observations
Train outlier robust models using robust regression and quantile regression
Identify outliers and novel observations
Estimate ARIMA (time series) models to predict temporal variables
Most of the examples presented in this course come from real datasets collected from the web such as Kaggle, the US Census Bureau, etc. All the lectures can be downloaded and come with the corresponding material. The teaching approach is to briefly introduce each technique, and focus on the computational aspect. The mathematical formulas are avoided as much as possible, so as to concentrate on the practical implementations.
This course covers most of what you would need to work as a data scientist, or compete in Kaggle competitions. It is assumed that you already have some exposure to data science / statistics.
Who this course is for:
Students aiming to do serious data science in R, with some knowledge about statistics
I worked for 7+ years exp as statistical programmer in the industry. Expert in programming, statistics, data science, statistical algorithms. I have wide experience in many programming languages. Regular contributor to the R community, with 3 published packages. I also am expert SAS programmer. Contributor to scientific statistical journals. Latest publication on the Journal of Statistical Software.