Practical Data Science: Reducing High Dimensional Data in R
4.2 (63 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
1,263 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Practical Data Science: Reducing High Dimensional Data in R to your Wishlist.

Add to Wishlist

Practical Data Science: Reducing High Dimensional Data in R

In this R course, we'll see how PCA can reduce a 5000+ variable data set into 10 variables and barely lose accuracy!
4.2 (63 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
1,263 students enrolled
Created by Manuel Amunategui
Last updated 4/2017
English
Current price: $10 Original price: $25 Discount: 60% off
1 day left at this price!
30-Day Money-Back Guarantee
Includes:
  • 2.5 hours on-demand video
  • 5 Supplemental Resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Understand various ways of reducing wide data sets
  • Understand Principal Component Analysis (PCA)
  • Control, tune and measure the effects of PCA
  • Use GBM modeling to measure the effectiveness of PCA
  • Reducing dimensionality with classic GBM & GLMNET Variable Selection
  • Use ensembling techniques to find the most stable variables
View Curriculum
Requirements
  • Some understanding and interest in the R programming language
Description

In this R course, we'll see how PCA can reduce a 5000+ variable data set down to 10 variables and barely lose accuracy! We'll look at different ways of measuring PCA's effectiveness and other ways of reducing wide data sets (those with lots of features/variables). We'll also look at the advantages and disadvantages with different ways of reducing data.

Who is the target audience?
  • Some understanding and interest in the R programming language
  • Interest in reducing large data sets
Students Who Viewed This Course Also Viewed
Curriculum For This Course
11 Lectures
02:25:00
+
High Dimensionality Data
2 Lectures 07:38

Quick overview of what will be covered in this class

Preview 01:03

This is an optional video explaining where to find the binaries for R and RStudio needed to follow this course.

Preview 06:35
+
Principle Component Analysis (PCA)
2 Lectures 18:53
Overview of different methods for reducing dimensionality and how PCA can be applied to drastically reduce the number of features while only losing some accuracy
Preview 07:52

Overview of the two PCA libraries (prcomp & princomp) from the {stats} library included in the base R install.

prcomp & princomp
11:01
+
Reducing Dimensionality With PCA and GBM modeling
2 Lectures 36:24

Here we will:

  • Download a large data sets from the UC Irvine Machine Learning Repository
  • Use GBM (Generalized Boosted Models) to model the raw data and predict an outcome
  • Apply prcomp to the data and model it using different amount of PCA components and compare accuracy
prcomp and GBM
19:31

prcomp and GBM, part 2
16:53
+
Reducing Dimensionality With Variable Selection
3 Lectures 48:21

Here we use GBM to reduce the number of variables while preserving feature names (reduction instead of compression). Note: Caret's code was updated to reflect latest object model in the attached PDF.

Variable selection using GBM (Generalized Boosted Models)
17:17

Variable selection using GBM, part 2
17:21

Same thing as before except using GLMNET (with a few twists). Note: Caret's code was updated to reflect latest object model in the attached PDF.

Variable selection using GLMNET
13:43
+
Reducing Dimensionality With Ensemble Modeling
2 Lectures 33:44

Brief look at the Minimum Redundancy Maximum Relevance (mRMRe) package to reduce very wide data sets quickly. Note: Caret's code was updated to reflect latest object model in the attached PDF.

Variable selection using Minimum Redundancy Maximum Relevance (mRMRe)
17:46

This is noted as 'optional' as it may be difficult for some to install and it is fairly complex to troubleshoot thus I cannot help you with issues.

Optional: Variable selection using fscaret
15:58
About the Instructor
Manuel Amunategui
4.4 Average rating
398 Reviews
3,105 Students
4 Courses
Data Scientist & Quantitative Developer

I am data scientist in the healthcare industry. I have been applying machine learning and predictive analytics to better patients lives for the past 3 years. Prior to that I was a developer on a trading desk on Wall Street for 6 years. On the personal side, I love data science competitions and hackathons - people often ask me how can one break into this field, to which I reply: 'join an online competition!'