# Data Analysis and Machine Learning with R

**5 hours**left at this price!

- 4.5 hours on-demand video
- 1 downloadable resource
- Full lifetime access
- Access on mobile and TV

- Certificate of Completion

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business- Handle missing values and duplicates
- Learn to scale and standardize values
- Learn to apply classification techniques and regression techniques
- Work with advanced algorithms and techniques to enable efficient machine learning using the R programming language.
- Explore concepts such as the random forest algorithm.
- Work with support vector machine and examine and plot the results.
- Find out how to use the K-Nearest Neighbor for data projection.
- Work with a variety of real-world algorithms that suit your problem.

This video gives overview of the entire course.

In fixed-width formatted files, columns have fixed widths; if a data element does not use up the entire allotted column width, then the element is padded with spaces to make up the specified width also. During data analysis, you will create several R objects.

Download and store the student-fwf.txt file

Specify the width

Load the data from R files and libraries

When we have abundant data, we sometimes want to eliminate the cases that have missing values for one or more variables. When you disregard cases with any missing variables, you lose useful information that the non-missing values in that case convey.

Download the data file

Get a data frame that has only the cases with no missing values

Read data and replace the missing values

In situations where we have categorical variables (factors) but need to use them in analytical methods that require numbers, we need to create dummy variables.

Read the data-conversion.csv file

Create dummies for all factors in the data frame

Choose the variable to create dummies for

In this video, we summarize the data using the summary function.

Read the data

Get the summary statistics

The lattice package produces Trellis plots to capture multivariate relationships in the data. Also, ggplot2 graphs are built iteratively, starting with the most basic plot.

Load the lattice package

Draw a boxplot and a scatter plot

For ggplot 2, draw an initial plot and add layers

Getting an idea of how the model does in training data itself is useful, but you should never use that as an objective measure.

Create and display a two-way table

Display raw numbers as proportions

Get row-wise percentages rounded to one decimal place

This video shows you how you can use the rpart package to build classification trees and the rpart.plot package to generate nice-looking tree diagrams.

Create data partitions

Generate a diagram of the tree

Generate the error/classification-confusion matrix

The MASS package contains the lda function for classification using linear discriminant function analysis.

Load the package and read the data

Convert the outcome variable class to a factor

Partition the data and build the Linear Discriminant Function model

R has several libraries that implement boosting where we combine many relatively inaccurate models to get a much more accurate model. The ada package provides boosting functionality on top of classification trees.

Load the package and read the data

Convert the outcome variable class to a factor

Generate predictions on the validation partition

In this video, we look at the use of the knn.reg function to build the model and then the process of predicting with the model as well. We also show some additional convenience mechanisms to make the process easier.

Load the dummies, FNN, scales, and caret packages

Generate dummies for the categorical variable

Create three partitions and build model for several values of K

The nnet package contains functionality to build neural network models for classification as well as prediction. In this recipe, we cover the steps to build a neural network regression model using nnet.

Find the range of the response variable

Build the model

Plot the network and compute the RMS error on the training data

The R implementation of some techniques, such as classification and regression trees, performs cross-validation out of the box to aid in model selection and to avoid overfitting.

Read the data

Show line numbers for discussion

For k fold cross validation→ k-fold cross-validation with k=5; for leave one out cross validation→ run leave-one-out-cross-validation

The standard R package stats provides the function for K-means clustering. We also use the cluster package to plot the results of our cluster analysis.

Define a convenience function to standardize the relevant variables

Use the convenience function to standardize the variables of interest

Perform K-means clustering for a given value of K

The hclust function in the package stats helps us perform hierarchical clustering.

Define a convenience function to standardize the relevant variables

Use the convenience function to standardize the variables of interest

Compute the distance matrix to provide as input to the hclust function

The stats package offers the prcomp function to perform PCA. This recipe shows you how to perform PCA using these capabilities.

View the correlation matrix to check whether some variables are highly correlated

Examine the rotations for the principal components generated

Visualize the importance of the components through a scree plot or a barplot

This video provides an overview of the entire course.

In this video, we will explore the model results closer.

Create a single feature random forest and examine one of the decision trees

Create a more complicated random forest and examine the decision tree

Examine the feature importance (information gain) results from the more complicated result

- Prior basic R programming knowledge, data frames, and some basic knowledge in statistics are assumed.

Data analysis has recently emerged as a very important focus for a huge range of organizations and businesses. Machine Learning explores the study and construction of algorithms that can learn from and make predictions on data. R makes detailed data analysis easier, making advanced data exploration and insight accessible to anyone interested in learning it. The R language is widely used among statisticians and data miners to develop statistical software and data analysis.

This comprehensive 2-in-1 course follows a recipe-based approach to exploring advanced algorithm and visualization concepts to get the most out of your data through real-world examples. To begin with, you’ll perform analyzing techniques and learn to handle missing values and duplicates. You’ll also learn to apply classification techniques and regression techniques. Moving further, you’ll work with advanced algorithms and techniques to enable efficient Machine Learning using the R programming language. Finally, you’ll work with a variety of real-world algorithms such as decision trees and support vector machines.

Towards the end of this course, you'll explore advanced algorithm and visualization concepts to get the most out of your data through real-world examples.

**Contents and Overview**

This training program includes 2 complete courses, carefully chosen to give you the most comprehensive training possible.

The first course, *R Data Analysis Solutions - Machine Learning Techniques*, covers analyzing techniques to get the most out of your data. This video empowers you by showing you ways to use R to generate professional analysis reports. It provides examples of various important analysis and machine-learning tasks that you can try out with associated and readily available data. You will learn to carry out different tasks on the data to bring it into action. By the end of this course, you will be able to carry out different analyzing techniques, apply classification and regression, and also reduce data.

The second course, *Machine Learning using Advanced Algorithms and Visualization in R*, covers Advanced Algorithms and additional visualization. In this course, you will work through various examples of advanced algorithms and focus a bit more on some visualization options. We’ll start by showing you how to use the random forest to predict what type of insurance a patient has based on their treatment and you will get an overview of how to use random forest/decision tree and examine the model. Then, we’ll walk you through the next example on letter recognition, where you will train a program to recognize letters using a support Vector machine, examine the results, and plot a confusion matrix. After that, you will look into the next example on soil classification from satellite data using K-Nearest Neighbor where you will predict what neighborhood a house is in based on other data about it. Finally, you’ll dive into the last example of predicting a movie genre based on its title, where you will use the tm package and learn some techniques for working with text data.

Towards the end of this course, you'll explore advanced algorithm and visualization concepts to get the most out of your data through real-world examples.

**About the Authors**

**Viswa Viswanathan**is an associate professor of Computing and Decision Sciences at the Stillman School of Business in Seton Hall University. After completing his Ph.D. in Artificial Intelligence, Viswa spent a decade in Academia and then switched to a leadership position in the software industry for a decade. During this period, he worked for Infosys, Igate, and Starbase. He embraced Academia once again in 2001. Viswa has taught extensively in diverse fields, including operations research, computer science, and software engineering, management information systems, and enterprise systems. In addition to teaching at the university, Viswa has conducted training programs for industry professionals. He has written several peer-reviewed research publications in journals such as Operations Research, IEEE Software, Computers and Industrial Engineering, and International Journal of Artificial Intelligence in Education. He has authored a book entitled Data Analytics with R: A Hands-on Approach.**Shanthi Viswanathan**is an experienced technologist who has delivered technology management and enterprise architecture consultations to many enterprise customers. She has worked for Infosys Technologies, Oracle Corporation, and Accenture. As a consultant, Shanthi has helped several large organizations, such as Canon, Cisco, Celgene, Amway, Time Warner Cable, and GE, among others, in areas such as data architecture and analytics, master data management, service-oriented architecture, business process management, and modeling. When she is not in front of her Mac, Shanthi spends time hiking in the suburbs of NY/NJ, working in the garden, and teaching yoga. Shanthi would like to thank her husband, Viswa, for all the great discussions on numerous topics during their hikes together and for exposing her to R and Java. She would also like to thank her sons, Nitin and Siddarth, for getting her into the data analytics world.**Tim Hoolihan**currently works at DialogTech, a marketing analytics company focused on conversations. He is the Senior Director of Data Science there. Prior to that, he was CTO at Level Seven, a regional consulting company in the US Midwest. He is the organizer of the Cleveland R User Group. In his job, he uses deep neural networks to help automate of a lot of conversation classification problems. In addition, he works on some side-projects researching other areas of Artificial Intelligence and Machine Learning. Outside Data Science, he is interested in mathematical computation in general; he is a lifelong math learner and really enjoys applying it wherever he can. Recently, he has been spending time in financial analysis, and game development. He also knows a variety of languages: R, Python, Ruby, PHP, C/C++, and so on. Previously, he worked in web application and mobile development.

- This course is perfect for:
- Data Scientist, Professional Developers who want to learn analytical techniques from scratch and understand how the R programming environment and packages can be used for developing Machine Learning systems.