Data Mining with R: Go from Beginner to Advanced!

Learn to use R software for data analysis, visualization, and to perform dozens of popular data mining techniques.

Created byGeoffrey Hubona, Ph.D.

Last updated 8/2020

English

What you'll learn

Use R software for data import and export, data exploration and visualization, and for data analysis tasks, including performing a comprehensive set of data mining operations.
Effectively use a number of popular, contemporary data mining methods and techniques in demand by industry including: (1) Decision, classification and regression trees (CART); (2) Random forests; (3) Linear and logistic regression; and (4) Various cluster analysis techniques.
Apply the dozens of included "hands-on" cases and examples using real data and R scripts to new and unique data analysis and data mining problems.

Course content

9 sections • 80 lectures • 11h 54m total length

Who should take and what will you get from this course ?8:49
Installing R and RStudio4:03
Orientation to Data Types and Structures Section3:33
Materials for Data Types and Structures1:09
Vectors: The Basic Default Data Structure in R11:45
Matrices, Lists and Dataframes: Other Important R Data Structures10:25
Manipulating Vectors in R7:29
Naming Vectors in R6:35
Creating Matrices in R5:12
Creating Lists in R9:43
Creating Lists in R (continued)11:25
Creating Dataframes in R2:45

Orientation to Data and File Input and Output1:34
Materials for Data and File Input and Output1:14
Reading in Data using scan() Function9:23
Reading in Data with scan() Function (continued)15:57
Using readline() Function to Prompt User for Input1:45
Reading in Files with read.table() and read.csv() Functions14:35
Writing R Session Files to Disk (Outputting Data)7:52
Data Input and Output Exercise2:23

Solution to Data Input and Output Exercise from Section 2 (1 of 2)10:33
Explore interactive data input with scan and the wrappers read.table and read.csv, including keyboard entry, console focus, and distinguishing headers and delimiters for file reading.
Solution to Data Input and Output Exercise from Section 2 (2 of 2)12:11
Materials for Visualizing your Data Section 31:00
Preprocessing and Visualizing Birth Data9:23
Master preprocessing and visualization in R by cleaning the workspace, managing memory with gc, and exploring large birth data to plot births by day of week and delivery type.
Preprocessing and Visualizing Birth Data (part 2)14:51
Learn to preprocess birth data and visualize it with lattice graphics in R, using bar charts, histograms, density plots, and conditioned panels by plurality and delivery type.
Preprocessing and Visualizing Birth Data (part 3)14:54
Preprocess and visualize birth data using box plots, violin plots, and level plots to reveal distributions, Apgar scores, and gestation relationships.
Visualizing Alumni Donations12:38
Visualizing Alumni Donations (part 2)8:22
Visualizing Alumni Donations (part 3)12:09
Visualizing Alumni Donations (part 4)7:03
Visualizing (Getting to Know) your Data Section Exercise1:59

Solution to Visualizing Virginia Deaths Exercise15:53
Introduction to Decision Trees and Random Forests7:18
Training Decision Trees with party Package9:10
Training Decision Trees with party Package (part 2)12:32
Bodyfat Decision Tree example with Package rpart13:01
Bodyfat Decision Tree example with Package rpart (part 2)8:25
Explains how to build and prune a body fat decision tree with the rpart package, using minsplit and cp controls, and evaluating pruning via the CP table.
Bagging and Random Forests with Section Exercise14:36

Begin Decision Tree and Random Forests Exercise Solution9:15
Random Forests Exercise Bagging Segment Solution9:19
Random Forests Exercise Solution (part 3)12:36
Materials for Regression and GLMs Section1:05
Apply regression and generalized linear modeling to the Hart data set through detailed script-based examples, with exercises in PTF form after reviewing lessons and videos.
Begin Regression Example10:38
Continue Regression Example10:48
Finish Regression Example6:47
Begin Regression and GLM Slides8:19
Finish Generalized Linear Modeling Slides8:13
Heart Data Binomial GLM Example12:59
Model heart data as a proportion using a binomial glm in R, with counts of heart attacks and non-attacks, and compare models via residual deviance, diagnostic plots, and chi-square tests.
Epidemic Data Poisson GLM Example4:38
Explore Poisson GLM modeling of AIDS case counts over time, diagnose fit with plots, and improve the model by adding a quadratic term, then compare via deviance and ANOVA.
Regression and GLMs Exercises1:08

Materials and End-of-Section-6 Exercise1:28
Regression and GLM Exercises Solutions (part 1)10:49
Regression and GLM Exercises Solutions (part 2)11:27
Regression and GLM Exercises Solutions (part 3)10:04
K-Means Iris Flower Example12:44
K-Means Exoplanets Example20:04
Explore k-means clustering on exoplanet data, using mass, periodicity, and eccentricity, visualized in a three-dimensional scatterplot with range normalization to form and compare three candidate clusters.
K-Medoids Iris Flower Re-Analysis Example8:02
Hierarchical Clustering Iris Flower Example7:18
Hierarchical Clustering Pottery Example11:16

Materials for Density-Based and Hierarchical Agglomerative Clustering Section1:35
Explore density-based and hierarchical agglomerative clustering materials for section 7, including Becher 2013, Kamins cluster analysis, Tamm residuals, MDA multivariate analysis, f pci-e, scripts, and slides.
Density-Based and Agglomerative Clustering Introduction and Previous Exercise11:48
Density-Based Clustering Example13:04
Explore density-based clustering with dbscan in R using the fpc package, focusing on reachability distance and min points, identifying noise, and visualizing results on the iris dataset.
Body Measurements and Agglomerative Hierarchical Clustering Example13:23
Continue Body Measurements Agglomerative Clustering Example16:41
Clustering Jet Fighters Example16:52

Materials and End-of-Section-8 Exercise1:08
Explore section 8's final cluster analysis examples, tackle the residual analysis exercise, and compare standardization by standard deviation versus by range using crime rate data.
K-Means Clustering Explained in Detail6:25
Clustering Crime Rates Example10:09
Clustering Crime Rates Example (part 2)13:26
Gastroenterologist Questionnaire Model-Based Clustering Eample14:23
Graphical Approaches to Cluster Analysis Examples9:17
Detecting Outliers9:09
Detecting Outliers (part 2)11:34

Requirements

Download and install no-cost R software (complete, easy-to-follow instructions are provided).
Download and install no-cost RStudio IDE software (complete, easy-to-follow instructions are provided).

Description

This is a "hands-on" business analytics, or data analytics course teaching how to use the popular, no-cost R software to perform dozens of data mining tasks using real data and data mining cases. It teaches critical data analysis, data mining, and predictive analytics skills, including data exploration, data visualization, and data mining skills using one of the most popular business analytics software suites used in industry and government today. The course is structured as a series of dozens of demonstrations of how to perform classification and predictive data mining tasks, including building classification trees, building and training decision trees, using random forests, linear modeling, regression, generalized linear modeling, logistic regression, and many different cluster analysis techniques. The course also trains and instructs on "best practices" for using R software, teaching and demonstrating how to install R software and RStudio, the characteristics of the basic data types and structures in R, as well as how to input data into an R session from the keyboard, from user prompts, or by importing files stored on a computer's hard drive. All software, slides, data, and R scripts that are performed in the dozens of case-based demonstration video lessons are included in the course materials so students can "take them home" and apply them to their own unique data analysis and mining cases. There are also "hands-on" exercises to perform in each course section to reinforce the learning process. The target audience for the course includes undergraduate and graduate students seeking to acquire employable data analytics skills, as well as practicing predictive analytics professionals seeking to expand their repertoire of data analysis and data mining knowledge and capabilities.

Who this course is for:

Anyone who wants to learn more about performing data analysis using a variety of popular, contemporary data mining techniques.
Data Mining beginners and professionals who wish to enhance their data mining knowledge and skill levels
Individuals seeking to gain more proficiency using the popular R and RStudio software suites.
Undergraduate students seeking to acquire in-demand analytics skills to enhance employment opportunities.
Graduate students seeking to acquire a wider repertoire of analytics skills for research data analysis tasks.

Data Mining with R: Go from Beginner to Advanced!

What you'll learn

Explore related topics

Course content

Data Types and Structures in R12 lectures • 1hr 23min

Data and File Input and Output8 lectures • 55min

Visualizing (Getting to Know) your Data11 lectures • 1hr 45min

Decision Trees and Random Forests7 lectures • 1hr 21min

Linear Modeling (Regression) and Generalized Linear Modeling (GLMs)12 lectures • 1hr 36min

K-Means, K-Medoids, and Hierarchical Cluster Analysis Approaches9 lectures • 1hr 33min

Density-Based and Agglomerative Hierarchical Clustering6 lectures • 1hr 13min

More Cluster Analysis Examples, Graphics, and Detecting Outliers8 lectures • 1hr 16min

K-Means TAM Residuals Cluster Analysis Software Case example7 lectures • 53min

Requirements

Description

Who this course is for: