R for Data Science Solutions

98 students enrolled

Please confirm that you want to add **R for Data Science Solutions** to your Wishlist.

Over 100 hands-on tasks to help you effectively solve real-world data problems using the most popular R packages and tec

98 students enrolled

Current price: $12
Original price: $100
Discount:
88% off

30-Day Money-Back Guarantee

- 5.5 hours on-demand video
- Full lifetime access
- Access on mobile and TV

- Certificate of Completion

Get your team access to Udemy's top 2,000 courses anytime, anywhere.

Try Udemy for Business
What Will I Learn?

- Get to know the functional characteristics of R language
- Extract, transform, and load data from heterogeneous sources-
- Understand how easily R can confront probability and statistics problems
- Get simple R instructions to quickly organize and manipulate large datasets
- Create professional data visualizations and interactive reports
- Predict user purchase behavior by adopting a classification approach
- Implement data mining techniques to discover items that are frequently purchased together
- Group similar text documents by using various clustering methods

Requirements

- This collection of independent videos offers a range of data analysis samples in simple and straightforward R code, providing step-by-step resources and time-saving methods to help you solve data problems efficiently.

Description

R is a data analysis software as well as a programming language. Data scientists, statisticians and analysts use R for statistical analysis, data visualization and predictive modeling. R is open source and allows integration with other applications and systems. Compared to other data analysis platforms, R has an extensive set of data products. Problems faced with data are cleared with R’s excellent data visualization feature.

The first section in this course deals with how to create R functions to avoid the unnecessary duplication of code. You will learn how to prepare, process, and perform sophisticated ETL for heterogeneous data sources with R packages. An example of data manipulation is provided, illustrating how to use the ‘dplyr’ and ‘data.table’ packages to efficiently process larger data structures. We also focus on ‘ggplot2’ and show you how to create advanced figures for data exploration.

In addition, you will learn how to build an interactive report using the “ggvis” package. Later sections offer insight into time series analysis, while there is detailed information on the hot topic of machine learning, including data classification, regression, clustering, association rule mining, and dimension reduction.

By the end of this course, you will understand how to resolve issues and will be able to comfortably offer solutions to problems encountered while performing data analysis.

**About The Author**

**Yu-Wei, Chiu (David Chiu)** is the founder of LargitData,
a startup company that mainly focuses on providing big data and machine
learning products. He has previously worked for Trend Micro as a
software engineer, where he was responsible for building big data
platforms for business intelligence and customer relationship management
systems. In addition to being a start-up entrepreneur and data
scientist, he specializes in using Spark and Hadoop to process big data
and apply data mining techniques for data analysis. Yu-Wei is also a
professional lecturer and has delivered lectures on big data and machine
learning in R and Python, and given tech talks at a variety of
conferences.

In 2015, Yu-Wei wrote Machine Learning with R Cookbook, Packt Publishing. In 2013, Yu-Wei reviewed Bioinformatics with R Cookbook, Packt Publishing.

Who is the target audience?

- This course is for budding data scientists, analysts, and those who are familiar with the basic operations of R.

Compare to Other R Courses

Curriculum For This Course

115 Lectures

05:31:15
+
–

Functions in R
9 Lectures
30:08

R has got a lot of functions and a user can also define a function for a specific purpose. Once user creates functions, it becomes really important to learn about passing arguments. Let’s explore how to create an R function and pass arguments to it.

Preview
06:25

R stores and manages variables using the environment. Each function activates its environment whenever a new function is created. Let’s see how the environment of each function works.

Understanding Environments

02:58

Lexical scoping determines how a value binds to a free variable in a function. This is a key feature that originated from the scheme functional programming language This video will show us how lexical scoping works in R.

Working with Lexical Scoping

02:49

In previous videos, we illustrated how to create a named function. But dealing with functions without a name, that is, closure, can be a bit tricky. Let’s see how to use it in a standard function.

Understanding Closure

02:17

R functions evaluate arguments lazily; the arguments are evaluated as they are needed. Thus, it reduces the time needed for computation. Let’s take a look at how lazy evaluation works.

Performing Lazy Evaluation

01:56

Normally, we operate on variables a and b by creating a function func (a,b). Although it is standard function syntax, it’s hard to read. We need to simplify the function syntax. Let’s see how we can do that using infix.

Creating Infix Operators

02:51

In R, there might be instances where we may have to assign a value to a function call. It becomes really important to learn about the replacement function, as it does the same. Let’s explore how it works and how we can use it.

Using the Replacement Function

02:17

There are various errors we may encounter during development in R, as in any other programming language. We need to learn how to handle those errors. Not only will it help in rectification but also it will make the program more robust.

Handling Errors in a Function

04:30

As it is inevitable for all code to include bugs, an R programmer has to be well prepared for them with a good debugging toolset. Let’s explore how to debug a function using various functions.

The Debugging Function

04:05

+
–

Data Extracting, Transforming, and Loading
6 Lectures
17:03

The primary step for any data analysis to collect high-quality, meaningful data. One important data source is open data, which is published online in either text format or as APIs. Let’s see how we can download the text format of an open data file.

Preview
02:14

Now that we’ve learned how to download open data files, it becomes crucial to know how to read and write them for further processing. Let’s see how we can read a file with R.

Reading and Writing CSV Files

01:13

The functions we’ve learned, read.table and read.csv, are useful only when the data size is small. We need know how to read large files for flexible data processing. Let’s explore how we can do that using the scan function.

Scanning Text Files

02:21

Excel is widely used for storing and analyzing data. One can convert Excel files to other formats. But it’s a bit complex process. This video shows how to read and write an Excel file containing world development indicators with the xlsx package.

Working with Excel Files

01:55

As R reads data in memory, it is perfect for processing and analyzing small datasets. However, database documents are becoming more common for the purpose of storing and analyzing bigger data. In this video, we will demonstrate how to use RJDBC to connect data stored in the database.

Reading Data from Databases

04:03

In most cases, the majority of data will not exist in the database, but will instead be published in different forms on the Internet. To dig up more valuable information from these data sources, we need to know how to access and scrape data from the Web.

Scraping Web Data

05:17

+
–

Data Pre-Processing and Preparation
10 Lectures
29:13

Data analysis requires preprocessing of data. There are various steps which need to be performed for preparing data ready for analysis. The primary step is renaming data variables so that one can operate efficiently. Let’s see how we can use the names function to rename variables.

Preview
02:27

There are many instances where one does not specify the data type while importing. This leads to a difficulty in data manipulation as assigned data type is different than actual one. Let’s explore how we can simplify this by converting data type.

Converting Data Types

02:35

Some attributes in employees and salaries are in date format. So, we have to calculate the number of years between the employees' date of birth and current year to estimate their age. This might be a tedious task. Let’s see how we can do it by manipulating date data.

Working with Date Format

02:55

Similar to database operations, we can add a new record to the data frame by the schema of the dataset. But in R, we can also perform these operations much more easily. In this video, we’ll see how to use the rbind and cbind functions to add a new record or attribute.

Adding New Records

02:08

Some analyses require partial data of particular interest. For that purpose, data filtering is required. In database operations, SQL command is used with the where clause to subset data. But, we need to know how it is done in R. Let’s see how we can do that.

Filtering Data

03:28

There might be some unwanted records in the dataset even after filtering. This can generate inaccurate results. Now that we’ve learned how to filter the dataset, let’s see how we remove or drop bad data.

Dropping Data

01:42

Similar to data tables in a database, we sometimes need to combine two datasets for correlating data. In R, we can do that using merge and plyr. Also, in order to analyze data more efficiently, R provides two methods, sort and order, which we must learn to sort data.

Merging and Sorting Data

03:59

There are instances where data analysis is possible only when the data is in a specific format. We must know how to reshape data and remove data with missing values for efficient data processing.

Reshaping Data

02:42

Missing data may occur from data process flaws or simply typos. But this small mistake can affect the whole analysis as the results may be misleading. Thus, it becomes really important to learn how to detect missing values in R.

Detecting Missing Data

03:14

We’ve learned how to detect missing data. But, there might be some instances where analysis may go wrong due to those missing values. This video will introduce some techniques to impute missing values for efficient data processing.

Imputing Missing Data

04:03

+
–

Data Manipulation
13 Lectures
30:33

When you process a dataset that is a gigabyte or larger in size, you may find that data.frame is rather inefficient. To address this issue, you can use the enhanced extension of data.frame—data.table. In this video, we will see how to create a data.table in R.

Preview
04:49

Two major advantages of a data.table as compared to a data.frame are the speed and clearer syntax of the former. Similar to a data.frame, we can perform operations to slice and subset a data.table. This video shows some operations that you can perform on data.table.

Managing Data with data.table

04:14

Another advantage of a data.table is that we can easily aggregate data without the help of additional packages. This video illustrates how to perform data aggregation using data.table.

Performing Fast Aggregation with data.table

02:09

In addition to performing data manipulation on a single table, we often need to import more features or correlate data from other data sources. Therefore, we can join two or more tables into one. In this video, we look at some methods to merge two data.table.

Merging Large Datasets with a data.table

02:40

To perform more advanced descriptive analysis, we must know how to use the dplyr package to reshape data and obtain summary statistics. This video will guide us how to use dplyr to manipulate data and to use the filter and slice functions to subset and slice data.

Subsetting and Slicing Data with dplyr

02:08

As a single machine cannot efficiently process big data problems, a practical approach is to take samples that we can effectively use to draw conclusions. Here, we will see how to use dplyr to sample from data.

Sampling Data with dplyr

01:25

Besides selecting individual rows from the dataset, we can use the select function in dplyr to select a single or multiple columns from the dataset. In this video, we will look at how to select particular columns using the select function.

Selecting Columns with dplyr

02:40

To perform multiple operations on data using dplyr, we can wrap up the function calls into a larger function call. Or, we can use the %>% chaining operator to chain operations instead. This video will introduce chaining of operations when using dplyr.

Chaining Operations in dplyr

02:09

Arranging rows in order may help us rank data by value or gain a more structured view of data in the same category. In this video, we will take a look at how to arrange rows with dplyr.

Arranging Rows with dplyr

01:21

To avoid counting duplicate rows, we can use the distinct operation in SQL. In dplyr, we can also eliminate duplicated rows from a given dataset. Let’s explore how to do that.

Eliminating Duplicated Rows with dplyr

01:39

Besides performing data manipulation on existing columns, there are situations where a user may need to create a new column for more advanced analysis. Let’s see how to add a new column using dplyr.

Adding New Columns with dplyr

01:14

Besides manipulating a dataset, the most important part of dplyr is that one can easily obtain summary statistics from the data. In SQL, we use the GROUP BY function for this purpose. This video will show us how to summarize data with dplyr.

Summarizing Data with dplyr

01:54

In a SQL operation, we can perform a join operation to combine two different datasets. In dplyr, we have the same join operation that enables us to merge data easily. In this video, we’ll learn how join works in dplyr.

Merging Data with dplyr

02:11

+
–

Visualizing Data with ggplot2
9 Lectures
26:42

In ggplot2, the data is charted by mapping the element from mathematical space to physical space. We can use simple elements to build a figure. This video shows how to construct our very first ggplot2 plot using the superstore sales dataset.

Preview
04:15

Aesthetics mapping describes how data variables are mapped to the visual property of a plot. In this video, we discuss how to modify aesthetics mapping on geometric objects so that we can change the position, size and color of a given geometric object.

Changing Aesthetics Mapping

03:09

Geometric objects are elements that we mark on the plot. One can use geometric object in ggplot2 to create either a line, bar, or box chart. Moreover, one can integrate them with aesthetic mapping to create a more professional plot. This video introduces how to use geometric objects to create various charts.

Introducing Geometric Objects

03:13

Besides mapping particular variables to the x or y axis, one can first perform statistical transformations on variables, and then remap the transformed variable to a specific position. With the help of this video, we’ll be able perform variable transformations with ggplot2.

Performing Transformations

03:27

Besides setting aesthetic mapping for each plot or geometric object, one can use scale to control how variables are mapped to the visual property. Let’s explore how to adjust the scale of aesthetics in ggplot2.

Adjusting Scales

02:16

When performing data exploration, it is essential to compare data across different groups. Faceting is a technique used to create graphs for subsets of data. This video will help us use the facet function to create a chart for multiple subsets of data.

Faceting

02:06

One can adjust the layout, color, font, and other attributes of a non-data object using the theme system in ggplot2. By default, ggplot2 provides many themes, and one can adjust the current theme. This video will show us how to use the theme_* function and customize a theme.

Adjusting Themes

01:33

To create an overview of a dataset, we may need to combine individual plots into one. This video will guide us on how to combine individual subplots into one plot.

Combining Plots

02:04

One can use a map to visualize the geographical relationship of spatial data. This video shows us how to create a map from a shapefile with ggplot2 and use ggmap to download data from a mapping service.

Creating Maps

04:39

+
–

Making Interactive Reports
9 Lectures
23:41

Creating an R Markdown report with RStudio is a straightforward process. This video will teach us how to use the built-in GUI to create markdown reports in different format.

Preview
02:47

The most attractive feature of a markdown report is that it enables the user to create a well-formatted document with plain text and simple markup syntax. Let’s see how we can use Markdown to create, edit, organize, and highlight data.

Learning the Markdown Syntax

03:14

In an R Markdown report, we can embed R code chunks with the knitr package. This video will guide us on how to create and control the output with different code chunk configurations.

Embedding R Code Chunks

02:18

The ggvis package creates HTML output with CSS and JavaScript. Thus, one can embed ggvis graphics into web applications or HTML reports. Let’s explore how we can do that and make interactive plots.

Creating Interactive Graphics with ggvis

02:38

In ggvis, one can use a simple layer to create lines, points, and other geometry objects in the plot. This video guides us through using ggvis syntax and grammar to create different plots.

Understanding Basic Syntax and Grammar

01:57

In addition to making different plots in ggvis, we can control how axes and legends are displayed in a ggvis figure with the *_axis and *_legend functions. Let’s see how we can set their appearance properties and rescale the mapping of the data with the scale function.

Controlling Axes and Legends and Using Scales

01:57

ggvis can be used to create an interactive web form. It allows the user to subset data and change the visual properties of the plot by interacting with the web form. In this video, we learn how to add interactivity to a ggvis plot.

Adding Interactivity to a ggvis Plot

02:55

An R Markdown report outputs codes and static figures; one cannot perform exploratory data analysis through web interaction. To enable the user to explore data via a web form, we have to build an interactive web page. In this video, we see how to create an interactive web report with Shiny.

Creating an R Shiny Document

03:40

In addition to hosting a Shiny app on a local machine, we can host our Shiny app online. RStudio provides a service, http://www.shinyapps.io/, that allows anyone to upload their Shiny app. Let’s see how we publish an R shiny report using shiny apps.

Publishing an R Shiny Report

02:15

+
–

Simulation from Probability Distributions
9 Lectures
21:43

Generating samples is the first step for working with probability distributions. So, learning this basic concept is very important.

Preview
02:51

When the probability of many events is equal, we need a uniform distribution to show that.

Understanding Uniform Distributions

01:38

You need to generate samples from a binomial distribution when you evaluate the success or failure of several independent trials. This video will enable you to do that.

Generating Binomial Random Variates

02:30

For calculating the probability of events with a fixed time interval, Poisson distribution is the best option.

Generating Poisson Random Variates

02:06

Real-world data follows a normal distribution curve. So sampling from a normal distribution should be learnt. This video will help you with that.

Sampling from a Normal Distribution

04:07

Using R to generate chi-squared distribution.

Sampling from a Chi-Squared Distribution

01:59

To estimate the mean of the population from a normal distribution, the student’s t distribution is used.

Understanding Student's t- Distribution

02:11

Along with generating samples, we can also sample subsets from datasets. This video will arm you to do that.

Sampling from a Dataset

01:52

When there are one or more random variables within the model, we need stochastic processes.

Simulating the Stochastic Process

02:29

+
–

Statistical Inference in R
9 Lectures
24:54

To estimate the interval range of unknown parameters in data, we use confidence intervals.

Preview
05:54

To compare two mean values, we perform Z-tests on data.

Performing Z-tests

03:12

In cases where the standard deviation is unknown, we need to perform student’s T-tests.

Performing Student's t-Tests

02:15

When the data distribution is unknown, non-parametric testing comes into the picture. We do that by conducting exact binomial tests in R.

Conducting Exact Binomial Tests

02:09

When comparing samples or a sample with a probability distribution test, we require Kolmogorov-Smirnov tests.

Performing Kolmogorov-Smirnov Tests

02:16

To discover the relationship between two categorical variables, we need to conduct a Pearson’s chi-squared test.

Working with the Pearson's Chi-Squared Tests

01:40

To test the belonging of two groups to a population, we use Wilcoxon rank Sum and signed rank tests.

Understanding the Wilcoxon Rank Sum and Signed Rank Tests

01:48

To investigate an individual categorical variable relation, one-way ANOVA is used.

Conducting One-way ANOVA

02:39

When there are more than two categorical variables involved, two-way ANOVA is used.

Performing Two-way ANOVA

03:01

+
–

Rule and Pattern Mining with R
8 Lectures
26:02

Before rule mining, it is important to transform the data into transactions.

Preview
05:11

You will learn to display transactions and associations in this video.

Displaying Transactions and Associations

03:02

To find the relation within a transaction dataset, we use the Apriori rule.

Mining Associations with the Apriori Rule

04:18

Sometimes, rules are repeated and are redundant. We need to know how to remove these rules to get significant information. This video will enable you to do that.

Pruning Redundant Rules

02:14

To explore the relation between items, we visualize association rules.

Visualizing Association Rules

02:35

Eclat is faster than Apriori in mining itemsets. Hence it is essential to learn how it works.

Mining Frequent Itemsets with Eclat

03:08

You will learn to create transactions with temporal information in this video.

Creating Transactions with Temporal Information

02:52

A better algorithm for mining frequent sequential patterns is cSPADE. It is important to learn about it and understand it.

Mining Frequent Sequential Patterns with cSPADE

02:42

+
–

Time Series Mining with R
9 Lectures
29:51

Time-indexed variables should be represented in time series data. Hence it is important to know how to create one.

Preview
05:11

Plotting a time series object will make visualization easy and effective.

Plotting a Time Series Object

02:26

To get the components of a time series, we need to decompose it.

Decomposing Time Series

02:11

To measure the error rate of a regression model, we need to calculate RMSE and RSE.

Smoothing Time Series

05:21

We can forecast a time series from the smoothed model. Let’s learn how to do that.

Forecasting Time Series

02:30

ARIMA takes auto-correlation into consideration. This helps in real-life examples.

Selecting an ARIMA Model

03:18

After understanding the ARIMA model, we can create an ARIMA model of our own. Let’s see how to do that.

Creating an ARIMA Model

02:19

We can predict values with the ARIMA model.

Forecasting with an ARIMA Model

02:11

You will apply your knowledge of the ARIMA model in prediction of stock prices.

Predicting Stock Prices with an ARIMA Model

04:24

2 More Sections

About the Instructor