R for Data Science Solutions
3.3 (5 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
94 students enrolled
Wishlisted Wishlist

Please confirm that you want to add R for Data Science Solutions to your Wishlist.

Add to Wishlist

R for Data Science Solutions

Over 100 hands-on tasks to help you effectively solve real-world data problems using the most popular R packages and tec
3.3 (5 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
94 students enrolled
Created by Packt Publishing
Last updated 12/2016
Current price: $10 Original price: $100 Discount: 90% off
5 hours left at this price!
30-Day Money-Back Guarantee
  • 5.5 hours on-demand video
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Get to know the functional characteristics of R language
  • Extract, transform, and load data from heterogeneous sources-
  • Understand how easily R can confront probability and statistics problems
  • Get simple R instructions to quickly organize and manipulate large datasets
  • Create professional data visualizations and interactive reports
  • Predict user purchase behavior by adopting a classification approach
  • Implement data mining techniques to discover items that are frequently purchased together
  • Group similar text documents by using various clustering methods
View Curriculum
  • This collection of independent videos offers a range of data analysis samples in simple and straightforward R code, providing step-by-step resources and time-saving methods to help you solve data problems efficiently.

R is a data analysis software as well as a programming language. Data scientists, statisticians and analysts use R for statistical analysis, data visualization and predictive modeling. R is open source and allows integration with other applications and systems. Compared to other data analysis platforms, R has an extensive set of data products. Problems faced with data are cleared with R’s excellent data visualization feature.

The first section in this course deals with how to create R functions to avoid the unnecessary duplication of code. You will learn how to prepare, process, and perform sophisticated ETL for heterogeneous data sources with R packages. An example of data manipulation is provided, illustrating how to use the ‘dplyr’ and ‘data.table’ packages to efficiently process larger data structures. We also focus on ‘ggplot2’ and show you how to create advanced figures for data exploration.

In addition, you will learn how to build an interactive report using the “ggvis” package. Later sections offer insight into time series analysis, while there is detailed information on the hot topic of machine learning, including data classification, regression, clustering, association rule mining, and dimension reduction.

By the end of this course, you will understand how to resolve issues and will be able to comfortably offer solutions to problems encountered while performing data analysis.

About The Author

Yu-Wei, Chiu (David Chiu) is the founder of LargitData, a startup company that mainly focuses on providing big data and machine learning products. He has previously worked for Trend Micro as a software engineer, where he was responsible for building big data platforms for business intelligence and customer relationship management systems. In addition to being a start-up entrepreneur and data scientist, he specializes in using Spark and Hadoop to process big data and apply data mining techniques for data analysis. Yu-Wei is also a professional lecturer and has delivered lectures on big data and machine learning in R and Python, and given tech talks at a variety of conferences.

In 2015, Yu-Wei wrote Machine Learning with R Cookbook, Packt Publishing. In 2013, Yu-Wei reviewed Bioinformatics with R Cookbook, Packt Publishing. 

Who is the target audience?
  • This course is for budding data scientists, analysts, and those who are familiar with the basic operations of R.
Students Who Viewed This Course Also Viewed
Curriculum For This Course
115 Lectures
Functions in R
9 Lectures 30:08

R has got a lot of functions and a user can also define a function for a specific purpose. Once user creates functions, it becomes really important to learn about passing arguments. Let’s explore how to create an R function and pass arguments to it.

Preview 06:25

R stores and manages variables using the environment. Each function activates its environment whenever a new function is created. Let’s see how the environment of each function works.

Understanding Environments

Lexical scoping determines how a value binds to a free variable in a function. This is a key feature that originated from the scheme functional programming language This video will show us how lexical scoping works in R.

Working with Lexical Scoping

In previous videos, we illustrated how to create a named function. But dealing with functions without a name, that is, closure, can be a bit tricky. Let’s see how to use it in a standard function.

Understanding Closure

R functions evaluate arguments lazily; the arguments are evaluated as they are needed. Thus, it reduces the time needed for computation. Let’s take a look at how lazy evaluation works.

Performing Lazy Evaluation

Normally, we operate on variables a and b by creating a function func (a,b). Although it is standard function syntax, it’s hard to read. We need to simplify the function syntax. Let’s see how we can do that using infix.

Creating Infix Operators

In R, there might be instances where we may have to assign a value to a function call. It becomes really important to learn about the replacement function, as it does the same. Let’s explore how it works and how we can use it.

Using the Replacement Function

There are various errors we may encounter during development in R, as in any other programming language. We need to learn how to handle those errors. Not only will it help in rectification but also it will make the program more robust.

Handling Errors in a Function

As it is inevitable for all code to include bugs, an R programmer has to be well prepared for them with a good debugging toolset. Let’s explore how to debug a function using various functions.

The Debugging Function
Data Extracting, Transforming, and Loading
6 Lectures 17:03

The primary step for any data analysis to collect high-quality, meaningful data. One important data source is open data, which is published online in either text format or as APIs. Let’s see how we can download the text format of an open data file.

Preview 02:14

Now that we’ve learned how to download open data files, it becomes crucial to know how to read and write them for further processing. Let’s see how we can read a file with R.

Reading and Writing CSV Files

The functions we’ve learned, read.table and read.csv, are useful only when the data size is small. We need know how to read large files for flexible data processing. Let’s explore how we can do that using the scan function.

Scanning Text Files

Excel is widely used for storing and analyzing data. One can convert Excel files to other formats. But it’s a bit complex process. This video shows how to read and write an Excel file containing world development indicators with the xlsx package.

Working with Excel Files

As R reads data in memory, it is perfect for processing and analyzing small datasets. However, database documents are becoming more common for the purpose of storing and analyzing bigger data. In this video, we will demonstrate how to use RJDBC to connect data stored in the database.

Reading Data from Databases

In most cases, the majority of data will not exist in the database, but will instead be published in different forms on the Internet. To dig up more valuable information from these data sources, we need to know how to access and scrape data from the Web.

Scraping Web Data
Data Pre-Processing and Preparation
10 Lectures 29:13

Data analysis requires preprocessing of data. There are various steps which need to be performed for preparing data ready for analysis. The primary step is renaming data variables so that one can operate efficiently. Let’s see how we can use the names function to rename variables.

Preview 02:27

There are many instances where one does not specify the data type while importing. This leads to a difficulty in data manipulation as assigned data type is different than actual one. Let’s explore how we can simplify this by converting data type.

Converting Data Types

Some attributes in employees and salaries are in date format. So, we have to calculate the number of years between the employees' date of birth and current year to estimate their age. This might be a tedious task. Let’s see how we can do it by manipulating date data.

Working with Date Format

Similar to database operations, we can add a new record to the data frame by the schema of the dataset. But in R, we can also perform these operations much more easily. In this video, we’ll see how to use the rbind and cbind functions to add a new record or attribute.

Adding New Records

Some analyses require partial data of particular interest. For that purpose, data filtering is required. In database operations, SQL command is used with the where clause to subset data. But, we need to know how it is done in R. Let’s see how we can do that.

Filtering Data

There might be some unwanted records in the dataset even after filtering. This can generate inaccurate results. Now that we’ve learned how to filter the dataset, let’s see how we remove or drop bad data.

Dropping Data

Similar to data tables in a database, we sometimes need to combine two datasets for correlating data. In R, we can do that using merge and plyr. Also, in order to analyze data more efficiently, R provides two methods, sort and order, which we must learn to sort data.

Merging and Sorting Data

There are instances where data analysis is possible only when the data is in a specific format. We must know how to reshape data and remove data with missing values for efficient data processing.

Reshaping Data

Missing data may occur from data process flaws or simply typos. But this small mistake can affect the whole analysis as the results may be misleading. Thus, it becomes really important to learn how to detect missing values in R.

Detecting Missing Data

We’ve learned how to detect missing data. But, there might be some instances where analysis may go wrong due to those missing values. This video will introduce some techniques to impute missing values for efficient data processing.

Imputing Missing Data
Data Manipulation
13 Lectures 30:33

When you process a dataset that is a gigabyte or larger in size, you may find that data.frame is rather inefficient. To address this issue, you can use the enhanced extension of data.frame—data.table. In this video, we will see how to create a data.table in R.

Preview 04:49

Two major advantages of a data.table as compared to a data.frame are the speed and clearer syntax of the former. Similar to a data.frame, we can perform operations to slice and subset a data.table. This video shows some operations that you can perform on data.table.

Managing Data with data.table

Another advantage of a data.table is that we can easily aggregate data without the help of additional packages. This video illustrates how to perform data aggregation using data.table.

Performing Fast Aggregation with data.table

In addition to performing data manipulation on a single table, we often need to import more features or correlate data from other data sources. Therefore, we can join two or more tables into one. In this video, we look at some methods to merge two data.table.

Merging Large Datasets with a data.table

To perform more advanced descriptive analysis, we must know how to use the dplyr package to reshape data and obtain summary statistics. This video will guide us how to use dplyr to manipulate data and to use the filter and slice functions to subset and slice data.

Subsetting and Slicing Data with dplyr

As a single machine cannot efficiently process big data problems, a practical approach is to take samples that we can effectively use to draw conclusions. Here, we will see how to use dplyr to sample from data.

Sampling Data with dplyr

Besides selecting individual rows from the dataset, we can use the select function in dplyr to select a single or multiple columns from the dataset. In this video, we will look at how to select particular columns using the select function.

Selecting Columns with dplyr

To perform multiple operations on data using dplyr, we can wrap up the function calls into a larger function call. Or, we can use the %>% chaining operator to chain operations instead. This video will introduce chaining of operations when using dplyr.

Chaining Operations in dplyr

Arranging rows in order may help us rank data by value or gain a more structured view of data in the same category. In this video, we will take a look at how to arrange rows with dplyr.

Arranging Rows with dplyr

To avoid counting duplicate rows, we can use the distinct operation in SQL. In dplyr, we can also eliminate duplicated rows from a given dataset. Let’s explore how to do that.

Eliminating Duplicated Rows with dplyr

Besides performing data manipulation on existing columns, there are situations where a user may need to create a new column for more advanced analysis. Let’s see how to add a new column using dplyr.

Adding New Columns with dplyr

Besides manipulating a dataset, the most important part of dplyr is that one can easily obtain summary statistics from the data. In SQL, we use the GROUP BY function for this purpose. This video will show us how to summarize data with dplyr.

Summarizing Data with dplyr

In a SQL operation, we can perform a join operation to combine two different datasets. In dplyr, we have the same join operation that enables us to merge data easily. In this video, we’ll learn how join works in dplyr.

Merging Data with dplyr
Visualizing Data with ggplot2
9 Lectures 26:42

In ggplot2, the data is charted by mapping the element from mathematical space to physical space. We can use simple elements to build a figure. This video shows how to construct our very first ggplot2 plot using the superstore sales dataset.

Preview 04:15

Aesthetics mapping describes how data variables are mapped to the visual property of a plot. In this video, we discuss how to modify aesthetics mapping on geometric objects so that we can change the position, size and color of a given geometric object.

Changing Aesthetics Mapping

Geometric objects are elements that we mark on the plot. One can use geometric object in ggplot2 to create either a line, bar, or box chart. Moreover, one can integrate them with aesthetic mapping to create a more professional plot. This video introduces how to use geometric objects to create various charts.

Introducing Geometric Objects

Besides mapping particular variables to the x or y axis, one can first perform statistical transformations on variables, and then remap the transformed variable to a specific position. With the help of this video, we’ll be able perform variable transformations with ggplot2.

Performing Transformations

Besides setting aesthetic mapping for each plot or geometric object, one can use scale to control how variables are mapped to the visual property. Let’s explore how to adjust the scale of aesthetics in ggplot2.

Adjusting Scales

When performing data exploration, it is essential to compare data across different groups. Faceting is a technique used to create graphs for subsets of data. This video will help us use the facet function to create a chart for multiple subsets of data.


One can adjust the layout, color, font, and other attributes of a non-data object using the theme system in ggplot2. By default, ggplot2 provides many themes, and one can adjust the current theme. This video will show us how to use the theme_* function and customize a theme.

Adjusting Themes

To create an overview of a dataset, we may need to combine individual plots into one. This video will guide us on how to combine individual subplots into one plot.

Combining Plots

One can use a map to visualize the geographical relationship of spatial data. This video shows us how to create a map from a shapefile with ggplot2 and use ggmap to download data from a mapping service.

Creating Maps
Making Interactive Reports
9 Lectures 23:41

Creating an R Markdown report with RStudio is a straightforward process. This video will teach us how to use the built-in GUI to create markdown reports in different format.

Preview 02:47

The most attractive feature of a markdown report is that it enables the user to create a well-formatted document with plain text and simple markup syntax. Let’s see how we can use Markdown to create, edit, organize, and highlight data.

Learning the Markdown Syntax

In an R Markdown report, we can embed R code chunks with the knitr package. This video will guide us on how to create and control the output with different code chunk configurations.

Embedding R Code Chunks

The ggvis package creates HTML output with CSS and JavaScript. Thus, one can embed ggvis graphics into web applications or HTML reports. Let’s explore how we can do that and make interactive plots.

Creating Interactive Graphics with ggvis

In ggvis, one can use a simple layer to create lines, points, and other geometry objects in the plot. This video guides us through using ggvis syntax and grammar to create different plots.

Understanding Basic Syntax and Grammar

In addition to making different plots in ggvis, we can control how axes and legends are displayed in a ggvis figure with the *_axis and *_legend functions. Let’s see how we can set their appearance properties and rescale the mapping of the data with the scale function.

Controlling Axes and Legends and Using Scales

ggvis can be used to create an interactive web form. It allows the user to subset data and change the visual properties of the plot by interacting with the web form. In this video, we learn how to add interactivity to a ggvis plot.

Adding Interactivity to a ggvis Plot

An R Markdown report outputs codes and static figures; one cannot perform exploratory data analysis through web interaction. To enable the user to explore data via a web form, we have to build an interactive web page. In this video, we see how to create an interactive web report with Shiny.

Creating an R Shiny Document

In addition to hosting a Shiny app on a local machine, we can host our Shiny app online. RStudio provides a service, http://www.shinyapps.io/, that allows anyone to upload their Shiny app. Let’s see how we publish an R shiny report using shiny apps.

Publishing an R Shiny Report
Simulation from Probability Distributions
9 Lectures 21:43

Generating samples is the first step for working with probability distributions. So, learning this basic concept is very important.

Preview 02:51

When the probability of many events is equal, we need a uniform distribution to show that.

Understanding Uniform Distributions

You need to generate samples from a binomial distribution when you evaluate the success or failure of several independent trials. This video will enable you to do that.

Generating Binomial Random Variates

For calculating the probability of events with a fixed time interval, Poisson distribution is the best option.

Generating Poisson Random Variates

Real-world data follows a normal distribution curve. So sampling from a normal distribution should be learnt. This video will help you with that.

Sampling from a Normal Distribution

Using R to generate chi-squared distribution.

Sampling from a Chi-Squared Distribution

To estimate the mean of the population from a normal distribution, the student’s t distribution is used.

Understanding Student's t- Distribution

Along with generating samples, we can also sample subsets from datasets. This video will arm you to do that.

Sampling from a Dataset

When there are one or more random variables within the model, we need stochastic processes.

Simulating the Stochastic Process
Statistical Inference in R
9 Lectures 24:54

To estimate the interval range of unknown parameters in data, we use confidence intervals.

Preview 05:54

To compare two mean values, we perform Z-tests on data.

Performing Z-tests

In cases where the standard deviation is unknown, we need to perform student’s T-tests.

Performing Student's t-Tests

When the data distribution is unknown, non-parametric testing comes into the picture. We do that by conducting exact binomial tests in R.

Conducting Exact Binomial Tests

When comparing samples or a sample with a probability distribution test, we require Kolmogorov-Smirnov tests.

Performing Kolmogorov-Smirnov Tests

To discover the relationship between two categorical variables, we need to conduct a Pearson’s chi-squared test.

Working with the Pearson's Chi-Squared Tests

To test the belonging of two groups to a population, we use Wilcoxon rank Sum and signed rank tests.

Understanding the Wilcoxon Rank Sum and Signed Rank Tests

To investigate an individual categorical variable relation, one-way ANOVA is used.

Conducting One-way ANOVA

When there are more than two categorical variables involved, two-way ANOVA is used.

Performing Two-way ANOVA
Rule and Pattern Mining with R
8 Lectures 26:02

Before rule mining, it is important to transform the data into transactions.

Preview 05:11

You will learn to display transactions and associations in this video.

Displaying Transactions and Associations

To find the relation within a transaction dataset, we use the Apriori rule.

Mining Associations with the Apriori Rule

Sometimes, rules are repeated and are redundant. We need to know how to remove these rules to get significant information. This video will enable you to do that.

Pruning Redundant Rules

To explore the relation between items, we visualize association rules.

Visualizing Association Rules

Eclat is faster than Apriori in mining itemsets. Hence it is essential to learn how it works.

Mining Frequent Itemsets with Eclat

You will learn to create transactions with temporal information in this video.

Creating Transactions with Temporal Information

A better algorithm for mining frequent sequential patterns is cSPADE. It is important to learn about it and understand it.

Mining Frequent Sequential Patterns with cSPADE
Time Series Mining with R
9 Lectures 29:51

Time-indexed variables should be represented in time series data. Hence it is important to know how to create one.

Preview 05:11

Plotting a time series object will make visualization easy and effective.

Plotting a Time Series Object

To get the components of a time series, we need to decompose it.

Decomposing Time Series

To measure the error rate of a regression model, we need to calculate RMSE and RSE.

Smoothing Time Series

We can forecast a time series from the smoothed model. Let’s learn how to do that.

Forecasting Time Series

ARIMA takes auto-correlation into consideration. This helps in real-life examples.

Selecting an ARIMA Model

After understanding the ARIMA model, we can create an ARIMA model of our own. Let’s see how to do that.

Creating an ARIMA Model

We can predict values with the ARIMA model.

Forecasting with an ARIMA Model

You will apply your knowledge of the ARIMA model in prediction of stock prices.

Predicting Stock Prices with an ARIMA Model
2 More Sections
About the Instructor
Packt Publishing
3.9 Average rating
7,264 Reviews
51,801 Students
616 Courses
Tech Knowledge in Motion

Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.

With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.

From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.

Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.