Cleaning Data In R with Tidyverse and Data.table
What you'll learn
- Convert raw and dirty data into clean data
- Understand how clean data looks and how to achieve it
- Use the R Tidyverse packages to clean data
- Handle missing values in R
- Detect outliers
- Filter and query tables
- Select a proper class for your data
- Clean various classes of data (numeric, string, categorical, integer, ...)
Requirements
- Just basic R skills are required for this course
- R and RStudio
Description
Welcome to this course on Data Cleaning in R with Tidyverse, Dplyr, Data.table, Tidyr and many more packages!
You may already know this problem: Your data is not properly cleaned before the analysis so the results are corrupted or you can not even perform the analysis.
To be brief: you can not escape the initial cleaning part of data science. No matter which data you use or which analysis you want to perform, data cleaning will be a part of the process. Therefore it is a wise decision to invest your time to properly learn how to do this.
Now as you can imagine, there are many things that can go wrong in raw data. Therefore a wide array of tools and functions is required to tackle all these issues. As always in data science, R has a solution ready for any scenario that might arise. Outlier detection, missing data imputation, column splits and unions, character manipulations, class conversions and much more - all of this is available in R.
And on top of that there are several ways in how you can do all of these things. That means you always have an alternative if you prefer that one. No matter if you like simple tools or complex machine learning algorithms to clean your data, R has it.
Now we do understand that it is overwhelming to identify the right R tools and to use them effectively when you just start out. But that is where we will help you. In this course you will see which R tools are the most efficient ones and how you can use them.
You will learn about the tidyverse package system - a collection of packages which works together as a team to produce clean data. This system helps you in the whole data cleaning process starting from data import right until the data query process. It is a very popular toolbox which is absolutely worth it.
To filter and query datasets you will use tools like data.table, tibble and dplyr.
You will learn how to identify outliers and how to replace missing data. We even use machine learning algorithms to do these things.
And to make sure that you can use and implement these tools in your daily work there is a data cleaning project at the end of the course. In this project you get an assignment which you can solve on your own, based on the material you learned in the course. So you have plenty of opportunity to test, train and refine your data cleaning skills.
As always you get the R scripts as text to copy into your RStudio instance. And on course completion you will get a course certificate from Udemy.
R-Tutorials Team
Who this course is for:
- Anybody working with R will benefit from this course since data cleaning is an integral part of any form of analysis
Instructor
R-Tutorials is your provider of choice when it comes to analytics training courses! Try it out – our 100,000+ students love it.
We focus on Data Science tutorials. Offering several R courses for every skill level, we are among Udemy's top R training provider. On top of that courses on Tableau, Excel and a Data Science career guide are available.
All of our courses contain exercises to give you the opportunity to try out the material on your own. You will also get downloadable script pdfs to recap the lessons.
The courses are taught by our main instructor Martin – trained biostatistician and enthusiastic data scientist / R user.
Should you have any questions, you are invited to check out our website, you can open a discussion in the course or you can simply drop us a pm.
We are here to help you boost your career with analytics training – Just learn and enjoy.