R Machine Learning solutions
2.6 (4 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
61 students enrolled
Wishlisted Wishlist

Please confirm that you want to add R Machine Learning solutions to your Wishlist.

Add to Wishlist

R Machine Learning solutions

Build powerful predictive models in R
2.6 (4 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
61 students enrolled
Created by Packt Publishing
Last updated 6/2017
English
Curiosity Sale
Current price: $10 Original price: $100 Discount: 90% off
30-Day Money-Back Guarantee
Includes:
  • 8.5 hours on-demand video
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Create and inspect the transaction dataset, performing association analysis with the Apriori algorithm
  • Visualize patterns and associations using a range of graphs and find frequent itemsets using the Eclat algorithm
  • Compare the differences between each regression method to discover how they solve problems
  • Predict possible churn users with the classification approach
  • Implement the clustering method to segment customer data
  • Compress images with the dimension reduction method
  • Incorporate R and Hadoop to solve machine learning problems on big data
View Curriculum
Requirements
  • Although programming with R is not a prerequisite, it would be helpful. A background in linear algebra and statistics is expected.
  • This easy-to-follow guide is full of hands-on examples of data analysis with R. Each topic is fully explained beginning with the core concepts, followed by step-by-step, practical examples and concluding with detailed explanations of each concept used.
Description

R is a statistical programming language that provides impressive tools to analyze data and create high-level graphics. This video course will take you from very basics of R to creating insightful machine learning models with R. You will start with setting up the environment and then perform data ETL in R.

Data exploration examples are provided that demonstrate how powerful data visualization and machine learning is in discovering hidden relationship. You will then dive into important machine learning topics, including data classification, regression, clustering, association rule mining, and dimensionality reduction.

About The Author

Yu-Wei, Chiu (David Chiu) is the founder of LargitData, a startup company that mainly focuses on providing big data and machine learning products. He has previously worked for Trend Micro as a software engineer, where he was responsible for building big data platforms for business intelligence and customer relationship management systems. In addition to being a start-up entrepreneur and data scientist, he specializes in using Spark and Hadoop to process big data and apply data mining techniques for data analysis. Yu-Wei is also a professional lecturer and has delivered lectures on big data and machine learning in R and Python, and given tech talks at a variety of conferences.

In 2015, Yu-Wei wrote Machine Learning with R Cookbook, Packt Publishing. In 2013, Yu-Wei reviewed Bioinformatics with R Cookbook, Packt Publishing.


Who is the target audience?
  • This video is for anyone who wants to enter the world of machine learning and is looking for a guide that is easy to follow.
Students Who Viewed This Course Also Viewed
Curriculum For This Course
124 Lectures
08:19:36
+
Getting Started with R
9 Lectures 42:22

This is give you brief information about the course.

Preview 04:38

R must be first installed on your system to work on it. 

Downloading and Installing R
06:10

RStudio makes the process of development with R easier. 

Downloading and Installing RStudio
03:10

R packages are an essential part of R as they are required in all our programs. Let's learn to do that. 

Installing and Loading Packages
05:46

You must know how to give data to R to work with data. You will learn that here.

Reading and Writing Data
05:54

Data manipulation is time consuming and hence needs to be done with the help of built-in R functions. 

Using R to Manipulate Data
05:46

R is widely used for statistical applications. Hence it is necessary to learn about the built in functions of R. 

Applying Basic Statistics
04:47

To communicate information effectively and make data easier to comprehend we need graphical representation. You will learn to plot figures in this section. 

Visualizing Data
03:33

Because of some limitations, it is a good practice to get data from external repositories. You will be able to do just that after this video. 

Getting a Dataset for Machine Learning
02:38
+
Data Exploration with RMS Titanic
8 Lectures 32:31

Reading a dataset is the first and foremost step in data exploration. We need to learn to how to do that. 

Preview 08:36

In R, since nominal, ordinal, interval, and ratio variable are treated differently in statistical modeling, we have to convert a nominal variable from a character into a factor. 

Converting Types on Character Variables
03:05

Missing values affect the inference of a dataset. Thus it is important to detect them. 

Detecting Missing Values
03:18

After detecting missing values, we need to impute them as their absence may affect the conclusion. 

Imputing Missing Values
04:30

After imputing the missing values, you should perform an exploratory analysis to summarize the data characteristics. 

Exploring and Visualizing Data
04:24

The exploratory analysis helps users gain insights into how single or multiple variables may affect the survival rate. However, it does not determine what combinations may generate a prediction model. We need to use a decision tree for that. 

Predicting Passenger Survival with a Decision Tree
03:58

After constructing the prediction model, it is important to validate how the model performs while predicting the labels. 

Validating the Power of Prediction with a Confusion Matrix
02:08

Another way of measuring performance is the ROC curve. 

Assessing performance with the ROC curve
02:32
+
R and Statistics
12 Lectures 47:59

When there are huge datasets, we can find the characteristics of the entire dataset with a part or sample of the data. Hence data sampling is essential. 

Preview 03:30

Probability distribution and statistics are interdependent. To provide a justification to the statistical information, we need probability. 

Operating a Probability Distribution in R
05:41

Univariate statistics deals with a single variable and hence is very simple. 

Working with Univariate Descriptive Statistics in R
05:09

To analyze the relation among more than two variables, multivariate analysis is done. 

Performing Correlations and Multivariate Analysis
03:01

Assessing the relation between dependent and independent variables is carried out through linear regression. 

Operating Linear Regression and Multivariate Analysis
03:25

To validate that the experiment results are significant, hypothesis testing is done. 

Conducting an Exact Binomial Test
03:48

To compare means of two different groups, one- and two-sample t-tests are conducted. 

Performing Student's t-test
03:13

Comparing a sample with a reference probability or comparing cumulative distributions of two data sets calls for a Kolmogorov- Smirnov test. 

Performing the Kolmogorov-Smirnov Test
04:43

The Wilcoxon Test is a non-parametric test for null hypothesis. 

Understanding the Wilcoxon Rank Sum and Signed Rank Test
02:04

To check the distribution of categorical variables of two groups, Pearson's chi-squared test is used. 

Working with Pearson's Chi-Squared Test
05:08

To examine the relation between categorical independent variables and continuous dependent variables, Anova is used. When there is a single variable, one-way ANOVA is used. 

Conducting a One-Way ANOVA
04:15

When there are two categorical values to be compared, two-way ANOVA is used. 

Performing a Two-Way ANOVA
04:02
+
Understanding Regression Analysis
13 Lectures 42:13

Linear regression is the simplest model in regression and can be used when there is one predictor value. 

Preview 04:53

To obtain summarized information of a fitted model, we need to learn how to summarize linear model fits. 

Summarizing Linear Model Fits
05:20

It would be really convenient for us if we could predict unknown values. You can do that using linear regression. 

Using Linear Regression to Predict Unknown Values
02:51

To check if the fitted model adequately represents the data, we perform diagnostics. 

Generating a Diagnostic Plot of a Fitted Model
03:57

In the case of a non-linear relationship between predictor and response variables, a polynomial regression model is formed. We need to fit the model. This video will enable you to do that.

Fitting a Polynomial Regression Model with lm
02:16

An outlier will cause diversion from the slope of the regression line. In order to avoid that, we need to fit a robust linear regression model. 

Fitting a Robust Linear Regression Model with rlm
02:15

We will perform linear regression on a real-life example, the SLID dataset. 

Studying a case of linear regression on SLID data
06:38

GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value. 

Applying the Gaussian Model for Generalized Linear Regression
02:11

GLM allows response variables with error distribution other than a normal distribution. We apply the Poisson model to see how that is done. 

Applying the Poisson model for Generalized Linear Regression
01:33

When a variable is binary, we apply the binomial model. 

Applying the Binomial Model for Generalized Linear Regression
02:02

GAM has the ability to deal with non-linear relationships between dependent and independent variables. We learn to fit a regression using GAM. 

Fitting a Generalized Additive Model to Data
03:13

Visualizing a GAM helps it to understand better. 

Visualizing a Generalized Additive Model
01:26

You can also diagnose a GAM model to analyze it. 

Diagnosing a Generalized Additive Model
03:38
+
Classification – Tree, Lazy, and Probabilistic
11 Lectures 41:30

Training and testing datasets are both essential for building a classification model. 

Preview 03:44

A partitioning tree works on the basis of split condition starting from the base node to the terminal node. 

Building a Classification Model with Recursive Partitioning Trees
06:10

Plotting the classification tree will make analyzing the data easier. You will learn to do this now. 

Visualizing a Recursive Partitioning Tree
03:03

Before making a prediction, it is essential to compute the prediction performance of the model. 

Measuring the Prediction Performance of a Recursive Partitioning Tree
02:48

There can be parts in a dataset which are not essential for classification. In order to remove these parts, we have to prune the dataset. 

Pruning a Recursive Partitioning Tree
02:37

Conditional inference trees are better than traditional classification trees because they adapt the test procedures for selecting the output. 

Building a Classification Model with a Conditional Inference Tree
01:56

Visualizing a conditional inference tree will make it easier to extract and analyze data from the dataset. 

Visualizing a Conditional Inference Tree
02:38

Like the prediction performance of a traditional classification tree, we can also evaluate the performance of a conditional inference tree.

Measuring the Prediction Performance of a Conditional Inference Tree
02:10

k-nearest neighbor classifier is a non parametric lazy learning method. Thus it has the advantages of both the types of methods. 

Classifying Data with the K-Nearest Neighbor Classifier
05:31

Classification in logistic regression is done based one or more features. It is more robust and doesn't have as many conditions as the traditional classification model. 

Classifying Data with Logistic Regression
04:37

The Naïve Bayes classifier is based on applying Bayes' theorem with a strong independent assumption. 

Classifying data with the Naïve Bayes Classifier
06:16
+
Neural Network and SVM
10 Lectures 34:10

Support vector machines are better at classification because they can capture complex relations between data points and provide both linear and non-linear classifications 

Preview 05:57

To control our training errors and margins, we use the cost function. The SVM classifier is affected by the cost. 

Choosing the Cost of an SVM
02:56

To visualize the SVM fit, we can use the plot function. 

Visualizing an SVM Fit
03:33

We can use the trained SVM to predict labels on a model. 

Predicting Labels Based on a Model Trained by an SVM
03:48

According to the desired output, you may need to generate different combinations of gamma and cost to train different SVMs. This is called tuning. 

Tuning an SVM
02:47

A neural network is used in classification, clustering and prediction. Its efficiency depends on how well you train it. Let's learn to do that. 

Training a Neural Network with neuralnet
04:07

We can use the trained SVM to predict labels on a model. 

Visualizing a Neural Network Trained by neuralnet
02:21

Similar to other classification models, we can predict labels using neural networks and also validate performance using confusion matrix. 

Predicting Labels based on a Model Trained by neuralnet
03:07

Nnet provides the functionality to train feed-forward neural networks with backpropagation. 

Training a Neural Network with nnet
02:45

As we have already trained the neural network using nnet, we can use the model to predict labels. 

Predicting labels based on a model trained by nnet
02:49
+
Model Evaluation
12 Lectures 38:19

The k-fold cross-validation technique is a common technique used to estimate the performance of a classifier as it overcomes the problem of over-fitting. In this video we will illustrate how to perform a k-fold cross-validation. 

Preview 03:42

In this video, we will illustrate how to use tune.svm to perform 10-fold cross-validation and obtain the optimum classification model. 

Performing Cross Validation with the e1071 Package
03:22

In this video we will demonstrate how to perform k-fold cross validation using the caret package. 

Performing Cross Validation with the caret Package
02:59

This video will show you how to rank the variable importance with the caret package. 

Ranking the Variable Importance with the caret Package
02:21

In this video, we will illustrate how to use rminer to obtain the variable importance of a fitted model. 

Ranking the Variable Importance with the rminer Package
02:30

In this video we will show how to find highly correlated features using the caret package.

Finding Highly Correlated Features with the caret Package
02:13

In this video, we will demonstrate how to use the caret package to perform feature selection. 

Selecting Features Using the Caret Package
04:58

To measure the performance of a regression model, we can calculate the distance from the predicted output and the actual output as a quantifier of the performance of the model. In this video we will illustrate how to compute these measurements from a built regression model. 

Measuring the Performance of the Regression Model
03:57

In this video we will demonstrate how to retrieve a confusion matrix using the caret package. 

Measuring Prediction Performance with a Confusion Matrix
02:07

In this video, we will demonstrate how to illustrate an ROC curve and calculate the AUC to measure the performance of a classification model. 

Measuring Prediction Performance Using ROCR
02:46

In this video we will use the function provided by the caret package to compare different algorithm-trained models on the same dataset. 

Comparing an ROC Curve Using the Caret Package
03:43

In this video we will see how to measure performance differences between fitted models with the caret package. 

Measuring Performance Differences between Models with the caret Package
03:41
+
Ensemble Learning
9 Lectures 44:32

The adabag package implements both boosting and bagging methods. For the bagging method, the package first generates multiple versions of classifiers, and then obtains an aggregated classifier. Let's learn the bagging method from adabag to generate a classification model. 

Preview 07:53

To assess the prediction power of a classifier, you can run a cross validation method to test the robustness of the classification model. This video will show how to use bagging.cv to perform cross validation with the bagging method. 

Performing Cross Validation with the Bagging Method
01:56

Boosting starts with a simple or weak classifier and gradually improves it by reweighting the misclassified samples. Thus, the new classifier can learn from previous classifiers. One can use the boosting method to perform ensemble learning. Let's see how to use the boosting method to classify the telecom churn dataset. 

Classifying Data with the Boosting Method
06:04

Similar to the bagging function, adabag provides a cross validation function for the boosting method, named boosting.cv. In this video, we will learn how to perform cross-validation using boosting.cv. 

Performing Cross Validation with the Boosting Method
02:06

Gradient boosting creates a new base learner that maximally correlates with the negative gradient of the loss function. One may apply this method on either regression or classification problems. But first, we need to learn how to use gbm. 

Classifying Data with Gradient Boosting
07:09

A margin is a measure of certainty of a classification. It calculates the difference between the support of a correct class and the maximum support of an incorrect class. This video will show us how to calculate the margins of the generated classifiers. 

Calculating the Margins of a Classifier
05:30

The adabag package provides the errorevol function for a user to estimate the ensemble method errors in accordance with the number of iterations. Let's explore how to use errorevol to show the evolution of errors of each ensemble classifier. 

Calculating the Error Evolution of the Ensemble Method
02:18

Random forest grows multiple decision trees which will output their own prediction results. The forest will use the voting mechanism to select the most voted class as the prediction result. In this video, we illustrate how to classify data using the randomForest package. 

Classifying Data with Random Forest
07:01

At the beginning of this section, we discussed why we use ensemble learning and how it can improve the prediction performance. Let's now validate whether the ensemble model performs better than a single decision tree by comparing the performance of each method. 

Estimating the Prediction Errors of Different Classifiers
04:35
+
Clustering
11 Lectures 48:29

Hierarchical clustering adopts either an agglomerative or a divisive method to build a hierarchy of clusters. This video shows us how to cluster data with the help of hierarchical clustering. 

Preview 08:40

In this video we demonstrate how to use the cutree function to separate the data into a given number of clusters. 

Cutting Trees into Clusters
03:30

In this video, we will demonstrate how to perform k-means clustering on the customer dataset. 

Clustering Data with the k-Means Method
04:10

We will now illustrate how to create a bivariate cluster plot. 

Drawing a Bivariate Cluster Plot
03:32

In this video we will see how to compare different clustering methods using cluster.stat from the fpc package. 

Comparing Clustering Methods
04:15

In this video we will see how to compute silhouette information. 

Extracting Silhouette Information from Clustering
02:40

In this video we will discuss how to find the optimum number of clusters for the k-means clustering method. 

Obtaining the Optimum Number of Clusters for k-Means
02:48

In this video, we will demonstrate how to use DBSCAN to perform density-based clustering. 

Clustering Data with the Density-Based Method
06:42

In this video, we will demonstrate how to use the model-based method to determine the most likely number of clusters. 

Clustering Data with the Model-Based Method
04:37

A dissimilarity matrix can be used as a measurement for the quality of a cluster. In this video, we will discuss some techniques that are useful to visualize a dissimilarity matrix. 

Visualizing a Dissimilarity Matrix
03:23

In this video, we will demonstrate how clustering methods differ with regard to data with known clusters. 

Validating Clusters Externally
04:12
+
Association Analysis and Sequence Mining
8 Lectures 31:17

Before starting with a mining association rule, you need to transform the data into transactions. This video will show how to transform any of a list, matrix, or data frame into transactions. 

Preview 03:35

The arule package uses its own transactions class to store transaction data. As such, we must use the generic function provided by arule to display transactions and association rules. Let's see how to display transactions and association rules via various functions in the arule package. 

Displaying Transactions and Associations
02:14

Association mining is a technique that can discover interesting relationships hidden in transaction datasets. This approach first finds all frequent itemsets and then generates strong association rules from frequent itemsets. In this video, we see how to perform association analysis using the apriori rule. 

Mining Associations with the Apriori Rule
07:24

Among the generated rules, we sometimes find repeated or redundant rules (for example, one rule is the subset of another rule). Let's explore how to prune (or remove) repeated or redundant rules. 

Pruning Redundant Rules
02:26

Besides listing rules as text, you can visualize association rules, making it easier to find the relationship between itemsets. In this video, we will learn how to use the aruleViz package to visualize the association rules. 

Visualizing Association Rules
05:06

An apriori algorithm performs a breadth-first search to scan the database. So, support counting becomes time consuming. Alternatively, if the database fits into the memory, you can use the Eclat algorithm, which performs a depth-first search to count the supports. Let's see how to use the Eclat algorithm. 

Mining Frequent Itemsets with Eclat
03:36

In addition to mining interesting associations within the transaction database, we can mine interesting sequential patterns using transactions with temporal information. This video demonstrates how to create transactions with temporal information.

Creating Transactions with Temporal Information
02:41

In contrast to association mining, we should explore patterns shared among transactions where a set of itemsets occurs sequentially. One of the most famous frequent sequential pattern mining algorithms is the Sequential Pattern Discovery using Equivalence classes (SPADE) algorithm. Let's see how to use SPADE to mine frequent sequential patterns. 

Mining Frequent Sequential Patterns with cSPADE
04:15
2 More Sections
About the Instructor
Packt Publishing
3.9 Average rating
7,336 Reviews
52,327 Students
616 Courses
Tech Knowledge in Motion

Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.

With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.

From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.

Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.