Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Data science with R: tidyverse
Rating: 4.4 out of 5(746 ratings)
5,218 students

Data science with R: tidyverse

R Programming Language, Data Analysis, Data Cleaning, Data Science, Data Wrangling, tidyverse, dplyr, ggplot2, RStudio
Last updated 10/2024
English

What you'll learn

  • How to use R's tidyverse libraries in your data science projects
  • How to write efficient R code for data science related tasks
  • What is clean data
  • How to clean your data with R
  • What is grammar of data wrangling
  • How to wrangle data with dplyr and tidyr
  • How to import data into R
  • How to properly parse imported data
  • How to chain R's functions into a pipeline
  • How to manipulate strings
  • What are Regular Expressions
  • How to use stringr library with Regular Expressions
  • How to use forcats library to manipulate categorical variables
  • What is Grammar of Graphics
  • How to visualize data with ggplot2 library
  • What is functional programing
  • How to use purrr library for mapping functions, nesting data, manipulating lists, etc.
  • What is relational data
  • How to use dplyr library for relational data
  • What is tidy evaluation
  • How to use tidyverse tools to finish a practical project

Course content

11 sections204 lectures30h 49m total length
  • Section intro5:57

    Learn to clean and transform data in R with tidyverse core libraries dplyr and tidier, mastering wrangling, filtering, mutating, grouping, summarizing, and pivoting.

  • Datascience & tidyverse6:58

    This video introduces data science in R and the tidyverse, covering cleaning, wrangling, exploring, and modeling with a cohesive suite of packages for tidy data.

  • Data transformation5:58

    discover data transformation with tidyverse, focusing on dplyr and tidyr to reshape, wrangle, and clean tabular data, and set up efficient pipelines using ggplot2 and the mpg dataset.

  • Manipulate variables (columns) - select(), rename() - part 13:35

    Learn to manipulate variables in R with dplyr by using select to extract columns as a table or vector and rename to change names, including helpers like contains, startswith, endswith.

  • Manipulate variables (columns) - select(), rename() - part 222:15

    Learn to manipulate data frame columns with dplyr in R using select and rename, including helper functions like starts_with, ends_with, contains, and in-line renaming with everything.

  • mutate(), transmute() - part 13:03

    Demonstrate how mutate in dplyr creates new variables from existing columns and how to create multiple columns, while transmute drops other columns and keeps only the new ones.

  • mutate(), transmute() - part 29:14

    Mutate creates new variables, such as the average miles per gallon from highway and city miles, while transmute drops variables, leaving a car label built from manufacturer and model.

  • Manipulate cases (rows) - filter(), slice() - part 13:06

    Learn how to manipulate rows in R with dplyr using filter and slice to subset data by conditions or by index; create new tables with each operation.

  • Manipulate cases (rows) - filter(), slice() - part 29:49

    Filter and slice rows with tidyverse, using and, or, and not equal. Extract Audi and 1999 models with highway mpg above 30, plus first five, 20–30, and last ten rows.

  • arrange() - part 11:53

    Use arrange() in dplyr to sort rows by one or more columns, in ascending or descending order, producing a new table with the rows ordered as specified.

  • arrange() - part 23:44

    Learn how to sort rows with dplyr's arrange() using a range, including ascending and descending order, and sorting by multiple columns such as year, cylinders, and displacement.

  • distinct() - part 11:22

    Explore how the distinct function in tidyverse removes duplicate rows, keeping one per unique combination or per a single column.

  • distinct() - part 26:14

    Learn how to use distinct in dplyr to remove duplicate rows, including selecting specific columns or the full table, and see how many originals remain.

  • Sample rows - part 12:39

    Learn to sample rows in R using dplyr, with the sample and sample_frac verbs, to generate random training, validation, and testing splits.

  • Sample rows - part 24:24

    Learn how to sample rows in dplyr, using sample with and without replacement, set seeds for reproducible results, and sample by number or by fraction.

  • summarise() - part 11:44

    Learn to create a full-table summary using summarize, generating a new table of statistics such as min, max, average, standard deviation, variance, and count, before exploring group by for breakdowns.

  • summarise() - part 24:58

    Apply summarize in dplyr to create aggregates such as row counts, distinct models, and mean, min, and max values for highway and city mileage.

  • group_by(), count() - part 12:48

    Learn how to use group_by with summarize and count in tidyverse to create per-group summaries and counts, grouping by manufacturer, model, or color.

  • group_by(), count() - part 26:13

    Learn to use group_by and count in dplyr to generate per-group summaries, such as cars per manufacturer and min/max by model, unlocking efficient summary statistics with grouped data.

  • Pipe operator: %>% - part 15:20

    Explore the forward pipe operator %>% in the tidyverse, showing how to chain dplyr functions into a clean, no-assignment pipeline that passes data between steps.

  • Pipe operator: %>% - part 28:58

    Learn to chain dplyr functions with the pipe operator %>% to filter, count, select, group, and summarize mean highway miles per gallon, then arrange results.

  • Rotate columns - pivoting - part 13:49

    Learn how to pivot data between wide and long formats using tidyr's pivot longer and pivot wider, with names_to and values_to, in the tidyverse.

  • Rotate columns - pivoting - part 216:08

    Discover pivoting in tidyverse: transform data between long and wide formats using pivot_wider and pivot_longer, with practical steps, filtering, and handling missing values.

  • Separate & unite columns - part 12:34

    Explore how to separate and unite columns with the tidier package, splitting one column into year and month using a separator, then merging to form a date column in R.

  • Separate & unite columns - part 215:26

    separate a date column into year, month, and day of month; remove and reintroduce leading zeros with string padding; unite the parts into a dash-separated date using tidyverse tools.

  • dplyr & tidyr in action - part 17:49

    Explore core dplyr and tidyr functions, including pull, group_by with mutate, case_when, row_number, and mutate variants, using the H flights table with 200k rows and 20+ columns.

  • dplyr & tidyr in action - part 211:05

    Master dplyr and tidyr by using pull and select, then group_by and mutate to compute means, classify transmission types from the first letter, and apply row_number for rankings.

  • dplyr & tidyr in action - part 317:50

    Use dplyr and tidyr, part of tidyverse, to wrangle a large flights table, count rows, group by carrier, calculate cancelled flight percentages, and pivot for carrier-level summaries.

  • Section summary and assignment9:27

    Review data science basics and tidyverse concepts, including dplyr functions like select, mutate, filter, group_by, and pivoting using the H flights dataset. Outline the assignment steps and expected outputs.

  • Assignment walkthrough - part 112:26

    Explore data wrangling and summarization in R using tidyverse through an assignment walkthrough (part 1). Build skills with dplyr and ggplot on flight data, calculating airports, cancellations, and carrier performance.

  • Assignment walkthrough - part 218:33

    Load tidyverse libraries and use lubridate to create year, month, day, quarter, and week features; compute deltas and visualize with ggplot, and build a heat map of carrier by month.

  • Additional content0:25

Requirements

  • R and RStudio already installed on your computer.
  • Basic knowledge of statistics is a plus.
  • Basic to intermediate R knowledge is a plus.
  • Complete R beginners will find course more challenging.
  • For complete R beginners I recommend first taking one of the R beginners courses.
  • Interest in data science and data science related tasks.
  • Interest in how to write efficient R code.
  • Please update R or R's libraries if necessary. List of versions ( R and all R's libraries used in the exercises) provided at the beginning and at the end of course material.

Description

Data Science skills are still one of the most in-demand skills on the job market today. Many people see only the fun part of data science, tasks like:  "search for data insight", "reveal the hidden truth behind the data", "build predictive models", "apply machine learning algorithms", and so on. The reality, which is known to most data scientists, is, that when you deal with real data, the most time-consuming operations of any data science project are: "data importing", "data cleaning", "data wrangling", "data exploring" and so on. So it is necessary to have an adequate tool for addressing given data-related tasks. What if I say, there is a freely accessible tool, that falls into the provided description above!


R is one of the most in-demand programming languages when it comes to applied statistics, data science, data exploration, etc. If you combine R with R's collection of libraries called tidyverse, you get one of the deadliest tools, which was designed for data science-related tasks. All tidyverse libraries share a unique philosophy, grammar, and data types. Therefore libraries can be used side by side, and enable you to write efficient and more optimized R code, which will help you finish projects faster.


This course includes several chapters, each chapter introduces different aspects of data-related tasks, with the proper tidyverse tool to help you deal with a given task. Also, the course brings to the table theory related to the topic, and practical examples, which are covered in R. If you dive into the course, you will be engaged with many different data science challenges, here are just a few of them from the course:

  • Tidy data, how to clean your data with tidyverse?

  • Grammar of data wrangling.

  • How to wrangle data with dplyr and tidyr.

  • Create table-like objects called tibble.

  • Import and parse data with readr and other libraries.

  • Deal with strings in R using stringr.

  • Apply Regular Expressions concepts when dealing with strings.

  • Deal with categorical variables using forcats.

  • Grammar of Data Visualization.

  • Explore data and draw statistical plots using ggplot2.

  • Use concepts of functional programming, and map functions using purrr.

  • Efficiently deal with lists with the help of purrr.

  • Practical applications of relational data.

  • Use dplyr for relational data.

  • Tidy evaluation inside tidyverse.

  • Apply tidyverse tools for the final practical data science project.


Course includes:

  • over 25 hours of lecture videos,

  • R scripts and additional data (provided in the course material),

  • engagement with assignments at the end of each chapter,

  • assignments walkthrough videos (where you can check your results).

All being said this makes one of Udemy's most comprehensive courses for data science-related tasks using R and tidyverse.


Enroll today and become the master of R's tidyverse!!!


Who this course is for:

  • Anyone who is interested in data science
  • Anyone who is interested in data analysis
  • Anyone who is interested in writing efficient R code
  • Anyone whose job, research or hobby is related to data cleaning or data visualizing
  • Aspiring data scientists, statisticians or data (business) analysts
  • Anyone who deals with data modeling and is usually struggling with data preparation / cleaning step
  • Students working with data