R Data Pre-Processing & Data Management - Shape your Data!
4.3 (481 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
3,391 students enrolled

R Data Pre-Processing & Data Management - Shape your Data!

Learn how to prepare your data for great analytics in R.
4.3 (481 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
3,391 students enrolled
Last updated 11/2018
English
English [Auto]
Current price: $76.99 Original price: $109.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 6 hours on-demand video
  • 14 articles
  • 5 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • import data into R in several ways while also beeing able to identify a suitable import tool
  • select and implement a proper object class (data.frame, data.table, data_frame)
  • convert your data into (and understand) a tidy data format
  • filter and query your data based on a wide range of parameters
  • join 2 data tables together with dplyr 2 table verb syntax
  • use SQL code within R
  • translate basic R into SQL
  • work with dates and time
  • work with strings using regular expressions
  • detecting outliers in datasets
Course content
Expand all 64 lectures 06:25:39
+ Introduction
5 lectures 46:32
Data Pre-Processing as Integral Part of Data Science
11:50
Let's See an R Example of Data Pre-Processing
22:34
Lures Example Script
00:22
+ Data Import and Data Structuring
5 lectures 35:10
Script: Data import
00:35
Importing Data and Snippets
14:17
Using fread to handle big data fast
07:53
Choosing the right class for your data
09:46
Further R Exercises
02:39
+ Cleaning Your Data
5 lectures 21:51
Script: Data cleaning
01:19
tidyr - How tidy data looks like
03:56
Wide to long data format
09:20
Splitting columns
03:04
Long to wide data format
04:12
+ Querying and Filtering Data with data.table
9 lectures 51:37
Script: Querying with data.table
03:00
What is data.table?
06:04
Basic queries
05:52
The by paramater for queries
08:51
Update on recycle queries
00:16
Keys
05:18
Data.table exercises
05:43
Data.table solutions
09:28
+ Queries and Filtering Exercises
9 lectures 01:22:37
Query exercises INTRO
01:42
10 Exercises on 'data.frame'
11:50
Data.frame Exercise Script
02:59
Data.frame Solutions 1-4
09:05
Data.frame Solutions 5-10
11:28
10 Exercises on 'data.table'
15:06
Data.table Exercise Script
02:32
Data.table Solutions 1-4
15:42
Data.table Solutions 5 - 10
12:12
+ Using dplyr on one and multiple Datasets
5 lectures 31:18
Script: dplyr
01:08
Single Table Verbs in 'dplyr'
10:19
Two Table Verbs - Mutating Joins
09:20
Two Table Verbs - Filtering Joins and handling of ID mismatches
04:22
+ Integrate SQL into R
5 lectures 18:47
Script: Integrate SQL
00:34
Get package dbplyr
00:10
R to SQL Translator
07:59
Set Up a SQLite Database in R
06:11
+ Detecting Outliers
4 lectures 26:17
Outlier Script
00:21
Introduction to Outlier Detection
11:02
Detecting Outliers in Univariate Datasets
09:08
Detecting Outliers in Multivariate Datasets
05:46
+ Working with Strings - Regular Expressions
7 lectures 29:33
Script: Working with Strings
02:02
Regular Expressions and Gsub
00:02
What You Should Know about Strings in R
05:34
The Gsub Family of Functions and Regular Expressions
07:42
Regular Expressions Syntax
05:17
A Great Add On Package
05:10
Working with Strings in R: Exercise with Solution
03:46
+ Working with Dates and Time
10 lectures 41:52
Data management and time series INTRO
02:23
Importing a Time Series From Excel
05:55
Section Script
02:05
Classes POSIXt, Date and Chron
08:50
Lubridate: Input and Time Zones
05:15
Lubridate: Weekdays and Intervals
02:57
Lubridate: Exercise Data Frame
03:18
Lubridate: Calculations and Leap Years
04:47
Lubridate: Data Handling Exercise
03:43
Further R Exercises
02:39
Requirements
  • Computer with R and RStudio ready to use
  • You should have basic R / RStudio knowledge
  • Required add on packages will be listed in the course orientation video
Description

Let’s get your data in shape!

Data Pre-Processing is the very first step in data analytics. You cannot escape it, it is too important. Unfortunately this topic is widely overlooked and information is hard to find.

With this course I will change this!

Data Pre-Processing as taught in this course has the following steps:

1.       Data Import: this might sound trivial but if you consider all the different data formats out there you can imagine that this can be confusing. In the course we will take a look at a standard way of importing csv files, we will learn about the very fast fread method and I will show you what you can do if you have more exotic file formats to handle.

2.       Selecting the object class: a standard data.frame might be fine for easy standard tasks, but there are more advanced classes out there like the data.table. Especially with those huge datasets nowadays, a data.frame might not do it anymore. Alternatives will be demonstrated in this course.

3.       Getting your data in a tidy form: a tidy dataset has 1 row for each observation and 1 column for each variable. This might sound trivial, but in your daily work you will find instances where this simple rule is not followed. Often times you will not even notice that the dataset is not tidy in its layout. We will learn how tidyr can help you in getting your data into a clean and tidy format.

4.       Querying and filtering: when you have a huge dataset you need to filter for the desired parameters. We will learn about the combination of parameters and implementation of advanced filtering methods. Especially data.table has proven effective for that sort of querying on huge datasets, therefore we will focus on this package in the querying section.

5.       Data joins: when your data is spread over 2 different tables but you want to join them together based on given criteria, you will need joins for that. There are several methods of data joins in R, but here we will take a look at dplyr and the 2 table verbs which are such a great tool to work with 2 tables at the same time.

6.       Integrating and interacting with SQL: R is great at interacting with SQL. And SQL is of course the leading database language, which you will have to learn sooner or later as a data scientist. I will show you how to use SQL code within R and there is even a R to SQL translator for standard R code. And we will set up a SQLite database from within R. 

7.  Outlier detection: Datasets often contain values outside a plausible range. Faulty data generation or entry happens regularly. Statistical methods of outlier detection help to identify these values. We will take a look at the implemention of these.

8. Character strings as well as dates and time have their own rules when it comes to pre-processing. In this course we will also take a look at these types of data and how to effectively handle it in R.

How do you best prepare yourself for this course?

You only need a basic knowledge of R to fully benefit from this course. Once you know the basics of RStudio and R you are ready to follow along with the course material. Of course you will also get the R scripts which makes it even easier.

The screencasts are made in RStudio so you should get this program on top of R. Add on packages required are listed in the course.

Again, if you want to make sure that you have proper data with a tidy format, take a look at this course. It will make your analytics with R much easier!

Who this course is for:
  • Data pre-processing is a crucial step of data related work - therefore this course is intended for all R users