
Learn to clean and transform data in R with tidyverse core libraries dplyr and tidier, mastering wrangling, filtering, mutating, grouping, summarizing, and pivoting.
This video introduces data science in R and the tidyverse, covering cleaning, wrangling, exploring, and modeling with a cohesive suite of packages for tidy data.
discover data transformation with tidyverse, focusing on dplyr and tidyr to reshape, wrangle, and clean tabular data, and set up efficient pipelines using ggplot2 and the mpg dataset.
Learn to manipulate variables in R with dplyr by using select to extract columns as a table or vector and rename to change names, including helpers like contains, startswith, endswith.
Learn to manipulate data frame columns with dplyr in R using select and rename, including helper functions like starts_with, ends_with, contains, and in-line renaming with everything.
Demonstrate how mutate in dplyr creates new variables from existing columns and how to create multiple columns, while transmute drops other columns and keeps only the new ones.
Mutate creates new variables, such as the average miles per gallon from highway and city miles, while transmute drops variables, leaving a car label built from manufacturer and model.
Learn how to manipulate rows in R with dplyr using filter and slice to subset data by conditions or by index; create new tables with each operation.
Filter and slice rows with tidyverse, using and, or, and not equal. Extract Audi and 1999 models with highway mpg above 30, plus first five, 20–30, and last ten rows.
Use arrange() in dplyr to sort rows by one or more columns, in ascending or descending order, producing a new table with the rows ordered as specified.
Learn how to sort rows with dplyr's arrange() using a range, including ascending and descending order, and sorting by multiple columns such as year, cylinders, and displacement.
Explore how the distinct function in tidyverse removes duplicate rows, keeping one per unique combination or per a single column.
Learn how to use distinct in dplyr to remove duplicate rows, including selecting specific columns or the full table, and see how many originals remain.
Learn to sample rows in R using dplyr, with the sample and sample_frac verbs, to generate random training, validation, and testing splits.
Learn how to sample rows in dplyr, using sample with and without replacement, set seeds for reproducible results, and sample by number or by fraction.
Learn to create a full-table summary using summarize, generating a new table of statistics such as min, max, average, standard deviation, variance, and count, before exploring group by for breakdowns.
Apply summarize in dplyr to create aggregates such as row counts, distinct models, and mean, min, and max values for highway and city mileage.
Learn how to use group_by with summarize and count in tidyverse to create per-group summaries and counts, grouping by manufacturer, model, or color.
Learn to use group_by and count in dplyr to generate per-group summaries, such as cars per manufacturer and min/max by model, unlocking efficient summary statistics with grouped data.
Explore the forward pipe operator %>% in the tidyverse, showing how to chain dplyr functions into a clean, no-assignment pipeline that passes data between steps.
Learn to chain dplyr functions with the pipe operator %>% to filter, count, select, group, and summarize mean highway miles per gallon, then arrange results.
Learn how to pivot data between wide and long formats using tidyr's pivot longer and pivot wider, with names_to and values_to, in the tidyverse.
Discover pivoting in tidyverse: transform data between long and wide formats using pivot_wider and pivot_longer, with practical steps, filtering, and handling missing values.
Explore how to separate and unite columns with the tidier package, splitting one column into year and month using a separator, then merging to form a date column in R.
separate a date column into year, month, and day of month; remove and reintroduce leading zeros with string padding; unite the parts into a dash-separated date using tidyverse tools.
Explore core dplyr and tidyr functions, including pull, group_by with mutate, case_when, row_number, and mutate variants, using the H flights table with 200k rows and 20+ columns.
Master dplyr and tidyr by using pull and select, then group_by and mutate to compute means, classify transmission types from the first letter, and apply row_number for rankings.
Use dplyr and tidyr, part of tidyverse, to wrangle a large flights table, count rows, group by carrier, calculate cancelled flight percentages, and pivot for carrier-level summaries.
Review data science basics and tidyverse concepts, including dplyr functions like select, mutate, filter, group_by, and pivoting using the H flights dataset. Outline the assignment steps and expected outputs.
Explore data wrangling and summarization in R using tidyverse through an assignment walkthrough (part 1). Build skills with dplyr and ggplot on flight data, calculating airports, cancellations, and carrier performance.
Load tidyverse libraries and use lubridate to create year, month, day, quarter, and week features; compute deltas and visualize with ggplot, and build a heat map of carrier by month.
Explore tables in tidyverse, distinguishing them from data frames, and learn to create, import, and parse flat files using readr and related libraries.
Explore how the table object in tidyverse extends data frames, enabling precise subsetting and full column names with no partial matching, via the table package and dplyr tidier.
Explore how table objects differ from data frames in R, using the table package to view datasets like diamonds and economics, and convert between table and data frame forms.
Create and convert data frames to tables using the table function, build tables from vectors, and mutate to create new columns while handling non-syntactic names and transpose capabilities.
Compare data frame and tibble by their print output and subsetting with names, positions, and the pipe dot, highlighting conversion to data frames for legacy code.
Explore differences between data frames and tables, and learn subsetting with dollar signs, double brackets, positional indexing, and the pipe with dot.
Import rectangular, tabular data with readr from flat files like CSV and DSV, enjoy faster imports, table outputs, and per-column parsing for clean, reproducible analysis.
Learn to read and import data with readr tools in R, including read_csv, read_table, and read_log, handling headers, delimiters, missing values, and various file formats.
Import data in R with the readr package, using read_csv, read_csv2, and read_tsv for inline and disk data. Demonstrate skip, delimiter, and comment handling and compare read_csv with base read.csv.
Master vector parsing in R with tidyverse readr, using parse_character, parse_logical, parse_integer, parse_double, parse_number, parse_factor, parse_date, parse_time to import data.
Master vector parsing in R with tidyverse, including character and logical vectors, numeric types, decimal and grouping marks, and date-time parsing, plus troubleshooting with helper diagnostics.
Learn to parse a file in R by specifying per-column parsers with read_csv using the types argument for integer, character, and double.
Learn practical file parsing in R with tidyverse, using guess_parser to infer column types and specifying col_types or mutating to factor, integer, and double for clean imports.
Explore useful import libraries for data science with R, including read_excel, rio, and data.table, and compare fast import methods like f read, read_csv, read.csv for large files with hands-on examples.
Learn how to write files in tidyverse, using csv, tsv, and rds formats with Radar, and explore export via Rio and Feather for cross-language data sharing.
Learn to export and import data in R: write csv with header, save as comma or semicolon, export to excel with rio, and read and write feather files.
Explore tidyverse tables versus data frames, learn to import and parse data with reader and per-column parsers, and complete assignments on continents and flights by converting column types.
Explore data import for assignment two in RStudio: load libraries, unzip data, read CSV files, and manually parse columns for a flights_02 dataset to ensure correct types.
import and parse a complex csv using read_delim, skip lines, set headers, and mutate types with dplyr and lubridate for dates. compare readr import speed to data.table fread.
Explore strings with tidyverse: manipulate, match, subset, and mutate strings; learn regular expressions, and manage factors and categorical variables within tidyverse workflows.
Learn to create and manipulate strings in the tidyverse, including string creation with single/double quotes, escaping, vectors and character columns, and handling special characters, newlines, tabs, and Unicode.
Explore stringr's pattern matching tools, including str_detect, str_locate, str_count, and str_locate_all, to detect, locate, and count patterns like apple in a vector of strings.
Learn to load a strings data set and apply tidyverse string functions: detect, count, locate, and locate all to filter and annotate fruit data.
Explore string subsetting tools inside the string library, including string_sub, string_subset, string_extract, string_extract_all, string_match, and string_match_all. Learn to extract substrings, filter by pattern, and retrieve first or all matches.
Learn practical string subsetting in R using tidyverse tools to extract substrings and count letter frequencies, and perform pattern matching with string subset, extract, and match.
Master string lengths and manipulation with the string R-package: measure characters, pad to a fixed width on left or right, truncate with ellipses, and trim whitespaces for cleaner data.
Explore string manipulation in base R and tidyverse: compute string lengths, filter fruits by length, pad and truncate strings with custom width and ellipsis, and trim whitespace on various sides.
Explore string mutating in tidyverse, learning to subset and replace parts of strings, choose first or all replacements, and convert cases with to lower, to upper, and to title.
Explore string mutating in tidyverse, including string sub and string replace for first letters of fruits, and apply string to lower, string to upper, and string to title.
Discover how to join and split strings in R with stringr: str_c, str_dup, str_split, str_split_fixed, str_glue, and str_glue_data; includes collapse, pattern, and data frame usage.
This lecture demonstrates joining and splitting strings with tidyverse tools, showing how to concatenate four sub-vectors, split by whitespace, use fixed patterns, and glue data frame values into strings.
Explore string order and string sort in stringr to obtain sorted indexes or values for a character vector, with numeric and decreasing options. Use str_view and str_view_all to highlight matches.
Explore string helpers in R, including shuffle, sample, string order, and string sort, with set.seed for reproducibility, and use string view for pattern matching.
Explore regular expressions and their use for searching and replacing strings, learn meta characters, classes, alternates, anchors, groups, look around, and escaping with stringr in R.
Explore regular expressions in R, grasp escaping rules, and learn to match dots and backslashes by using double backslashes and string view tricks.
Master regular expressions by exploring special characters and classes, quantifiers, and anchors. Use groups, brackets, and the pipe to build flexible patterns for digits, letters, whitespace, and punctuation.
Explore regex in R tidyverse to identify digits and non-digits, use character classes, and locate whitespace, newlines, and tabs with string subset and string view.
Learn alternates, anchors, and groups in regular expressions, using the pipe for or, brackets for character classes, ranges from A to F, anchors to mark boundaries, and back references.
Explore regular expressions, anchors, alternates, and groups in data science with R, using string subset and string view all to match starts, ends and exact words.
Explore lookarounds in regular expressions: followed by, not followed by, preceded by, and not preceded by, with examples, and quantifiers like zero or more, one or more, and exact two.
Master look arounds and quantifiers in regular expressions through hands-on practice, including matching W followed by A, preceded by W, and counting words in sentences after punctuation removal and lowercasing.
Master regular expressions through hands-on practice with real problems, use stringr from tidyverse to wrangle text, and apply regex to a corpus for text mining.
Learn how tidyverse handles categorical variables with factors using the forcats package, counting values by level and extracting unique values with fct_count and fct_unique.
Learn to work with factors in tidyverse by converting columns to factor using mutate, inspecting levels with fct_count and levels, and visualizing frequencies with a ggplot bar plot.
Explore combining and ordering categorical variables in R with FCT, including factor combine, FCT Relevel, FCT in freq, FCT in order, and FCT Reeve, with practical examples.
Explore combining and ordering factors in R with tidyverse tools, including merging factor levels, manually reordering, ordering by frequency, and reversing order for clear bar plots.
Explore forecasts tools to adjust factor levels in R, including fcked recode, fcked collapse, fct other, fct drop, fct expand, and fct explicit na.
Explore changing factor levels in tidyverse using recode, collapse, drop, and expand, and map manufacturers to countries of origin while creating non US and other groups.
Master tidyverse string operations and regular expressions to process a text corpus: read lines, clean data, sample text, and analyze corpus words by frequency.
download the corpus and import the text data in r to explore a 50,000-line corpus using read lines, basic inspection, and pattern matching for punctuation, digits, and phone-number patterns.
Explore an assignment walkthrough part 2 in data science with R: tidyverse, focusing on corpus processing, extracting word patterns before commas, counting frequencies, and identifying the most common words.
Clean and analyze a corpus by converting to lowercase, removing punctuation and digits, normalizing whitespace, and computing word counts with coverage insights and a ggplot visualization.
Walks through a four-part assignment to select the top 100 corpus words, compute their sampling probabilities, and draw 10,000 weighted samples with dplyr, then visualize counts.
Explore handling dates and times in R with tidyverse: create dates, generate sequences, extract components, perform arithmetic, and use durations, periods, and intervals with time zones.
Learn to wrangle date and time data in tidyverse with lubridate and HMS, including parsing, converting strings to date objects, and handling time zones.
Learn to create date, time, and datetime objects with lubridate, from strings or components. Parse diverse formats using helper functions and specify the correct component order for accurate parsing.
Learn to create and parse date and time objects in R using lubridate and HMS, with NYC flights data, covering parsing formats and building date time objects from components.
Learn to extract components from existing date and datetime objects using lubridate, with intuitive accessors for year, month, day, weekday, week, ISO week, quarter, semester, am/pm, dst, and leap year.
Learn to extract datetime components in tidyverse by deriving year, month, day, hour, minute, second, weekday, week, ISO week, and quarter from a date time object using mutate.
Learn to round date and date-time values with lubridate using floor_date, round_date, ceiling_date, and rollback, then update components via assignment and the update function.
Round dates in R with floor, ceiling, and round date at the month level, update components (year, month, day, hour, minute, second) or use update, noting rollbacks for invalid values.
Explore datetime arithmetic with lubridate, including durations, periods, and intervals. Learn how durations track time in seconds and handle daylight saving time, leap years, and leap seconds.
Learn date-time arithmetic and durations in R with Lubridate: adding days and hours, converting age to duration, constructing seconds, minutes, months, and handling daylight saving time and leap years.
Explore periods in lubridate, time spans like days and months that ignore timeline irregularities, unlike durations. Learn constructors and conversions between periods and seconds, and how daylight saving affects them.
Explore testing periods with lubridate, converting time to period objects, performing arithmetic with periods, constructing periods from seconds, minutes, hours, days, months, and years, daylight saving and leap year notes.
Learn to create and flip intervals in R with lubridate, extract start and end boundaries, test within and overlaps, and generate date-vector intervals with lengths.
Explore time zones in R using lubridate, covering UTC, daylight saving time, historic calendars, and helper functions to print and parse date times across multiple zones.
Explore how to view and filter time zone names in R, extract US and Europe zones with string subset, and force times in a chosen zone.
Explore wrangling date and time data with lubridate and dplyr, covering parsing, components, durations, periods, and intervals, and apply these skills to energy consumption data and time zone handling.
Demonstrates an assignment walkthrough in R: set up your script, parse dates with lubridate, analyze leap years, and compute holiday gaps with tidyverse tools.
Walk through a data science with R: tidyverse assignment that imports data with read_csv, mutate date_time, and group by year and month to summarize energy consumption.
Learn to visualize data with ggplot2 in the tidyverse, from distributions to relationships. Explore histograms, density plots, bar plots, scatter plots, and maps, plus time series and subplots.
Learn why data visualization matters for quick, insightful answers in exploratory data analysis. See how ggplot2 uses the grammar of graphics through layered plots.
Explore visualizing a continuous variable's distribution using histograms, density plots, and area plots in R with ggplot2, highlighting counts, density, and the normalization to one.
Learn to visualize distributions of a continuous variable in R using tidyverse's ggplot2, building histograms and density plots, and customizing bins, borders, fills, and axis labels.
Explore visualizing continuous variable distributions with ggplot2: compare highway and city mpg histograms and densities, export figures with ggsave, and explore area and cumulative distribution plotting.
Visualize the distribution of a categorical variable with a bar plot in ggplot2, showing counts per category and customizing colors with a fill scale.
Visualize distribution of a categorical variable with bar plots in ggplot. Learn to count frequencies, compute percentages, and customize colors with viridis and brewer palettes.
Explore visualizing the relation between two continuous variables using a scatter plot in ggplot2, map one variable to x and the other to y, and add a smoothing line.
Explore scatter plots in ggplot for continuous variables, mapping city versus highway consumption and carat versus price in the diamonds dataset, with color, transparency, and a regression line.
Visualize the distribution of two categorical variables using bar plots and scatter plots in ggplot, by forming subgroups from their combinations and applying stack, dodge, fill, with Viridis colors.
Visualize distributions of two categorical variables with bar plots and scatter plots in ggplot2, using cars and diamonds datasets, featuring stacked, dodged, and fill options with viridis and jitter.
Learn to compare a continuous variable across categorical groups with box plots and violin plots in R using ggplot2, including kernel density estimation and theme customization.
Explore box plots and violin plots to compare highway mpg across car classes and diamond prices across colors, apply a logarithmic transformation on the y axis, and tweak themes.
Learn to map multiple variables in a single ggplot2 scatter plot using tidyverse facets via facet grid or facet wrap. Explore color, size, and shape while minimizing visual noise.
Explore how to visualize multiple variables in one ggplot using facet wrap and facet grid, with mappings for x, y, size, color, and shape across mpg and diamonds.
Learn to visualize time series with line charts in ggplot2, mapping time on the x axis to values on the y axis. Apply geom_line in RStudio to plot historic data.
Learn to visualize time series in R using ggplot2 on the economics data, plotting unemployment over date with line charts, and compare wide vs long formats with grouping and color.
Create heatmaps, word clouds, and geographic maps with ggplot2 by mastering the grammar of graphics and layer-based plotting, using extensions for word clouds and maps.
Create heatmaps with ggplot from the cars data, showing counts by class and manufacturer. Compare average highway consumption and build word clouds and a crime map with viridis.
Demonstrates creating subplots inside a main ggplot2 figure without facets, using the plot package for simple, exportable layouts, and hints at cowplot in the next video.
Master subplots in data science with R by using cow plot's plot grid to arrange six ggplot plots from mpg, displacement, cylinders, drive type, and transmission.
Explore an assignment walkthrough in R tidyverse, building plots for top four carriers and flight distance, and then sample diamonds to create a facet scatter plot by color.
Import and clean a corpus, compute word counts, and create a top 200 word cloud with color groups. Build hourly, daily, and weekly energy plots and export.
Explore functional programming in tidyverse with the per package, learning map and list transformations, nesting data in tables, and loop-free workflows for importing, transforming, and plotting.
Explore functional programming in R and the tidyverse per package, using map functions. Iterate over vectors and lists, cover filtering, aggregation, summarizing, transforming, reshaping.
Explore the purrr map family, which applies a function to each element of a list or vector with controllable output. Review columnwise operations and class checks as practical examples.
Explore the map function in R's tidyverse to apply statistics across numeric flight columns, computing the mean, min, max, and standard deviation while handling missing values.
Explore how the map family in tidyverse controls output types when applying a function to each element, producing lists, character, logical, integer, or numeric (double) vectors, or data frames.
Master map output control in R by using map_dbl to produce numeric vectors, build summary tables of means and sds, and combine data frames with map_dfc.
Explore shortcuts for the map function, including extracting elements by name or position and writing anonymous functions with the tilde operator, illustrated with lm models.
Demonstrate map shortcuts in tidyverse to fit multiple linear models on the cars data, predicting highway mpg from displacement across cylinder groups, and extract r-squared with concise syntax.
Explore how to extend map family functions to iterate over multiple arguments, using map two, p map, and invoke map with lists and vectors to sample from several distributions.
Explore mapping over multiple arguments in R with map, pmap, and invoke map, generate and nest data from normal distributions, visualize with ggplot density plots, and export plots using walk.
Learn to filter and reshape lists in R using the per package, mastering pluck, keep, discard, compact, head, tail, flatten, and transpose to simplify list manipulation.
Learn how to build and manipulate complex lists in R using the tidyverse, including creating random vectors, combining tables, and using pluck, keep, discard, flatten, and transpose operations.
Explore summarizing and joining lists in tidyverse, using every, some, has element, detect, detect index, and vector depth, then learn append, prepend, and splice to combine lists.
Explore summarizing and joining lists in tidyverse R by applying every, sum, has element, detect, detect index, vec depth, and list splice, append, and prepend operations.
Transform lists in R using tidyverse tools like modify, modify_at, and modify_if to apply functions by name, index, or test; generate combinations with cross2, and reduce or accumulate results.
Learn to transform lists in tidyverse with modify, modify_if, last, cross, reduce, and accumulate; apply functions to lists, vectors, and data frames like the diamonds dataset.
Learn how to nest data frames within a tidyverse table by grouping data with dplyr and using nest to store sub tables, then unnest to return to a flat frame.
Learn to create and manipulate nested data frames with dplyr, grouping by manufacturer to nest and unnest, count rows, and compute the average highway miles per gallon.
Explore the nested data workflow in tidyverse by creating list columns, nesting by groups like species in the iris data set, and fitting a linear model.
Learn to implement a nested data workflow by fitting a separate linear model for each manufacturer to predict highway consumption from displacement and cylinders, then extract coefficients and r-squared values.
Learn to import multiple CSV files into R with purrr map, handling single and two-level directories, setting column types, and binding rows into one data frame.
Learn purrr workflows to export multiple csv files per car using map and pmap, with group_by, mutate, and nest, then create and save per-car plots with ggplot and ggsave.
Learn functional programming in R with map and map_invoke to transform lists and nest data, implement simulations from a blueprint, and visualize Gapminder data with density plots and line charts.
Follow an assignment walkthrough in R with tidyverse to import a simulation blueprint, generate data from various distributions, and visualize results with ggplot2 and cowplot.
Walks through importing Gapminder data from multiple country CSVs, combining into one table, converting types, and building continent-specific line plots with ggplot2 in tidyverse.
Explore how dplyr handles relational data in R, mimicking SQL to join, filter, and mutate tables from data warehouses for data science.
Explore relational data across multiple related tables and learn to join them to form a unified dataset. Use SQL concepts in tidyverse with dplyr to fetch and merge data.
Explore an imitation database of NYC flights 13, featuring five tables—airlines, airports, planes, weather, and flights—and learn how to join them via keys such as carrier, tail number, and origin/destination.
Explore relational data with dbplyr in R and tidyverse using NYC flights example database; inspect airlines, airports, planes, weather, and flights, and learn joins and set operations to combine data.
Master mutating joins in data science with R by learning how keys define table connections, how inner, left, right, and full joins combine variables, and how dplyr implements these joins.
Explore mutating joins in R tidyverse, performing inner, left, right, and full joins on flights data to add carrier names, destinations, and counts, with practical suffix and renaming techniques.
Explore filtering joins in r tidyverse, focusing on semi join and anti join, how they affect observations, handle duplicates and key mismatches, and compare to mutating joins for clean data.
Demonstrates filtering joins in tidyverse using semi join and anti join on airline data. Compare left and right joins and inner joins, and emulate with where clauses in SQL.
Explore set operations in tidyverse with dplyr, learning column-wise and row-wise binding using bind_cols and bind_rows, and performing set operations like intersect, union, and setdiff on tables.
Explore set operations in tidyverse by binding columns and rows on two airline tables, then compute intersect, setdiff, and union, with distinct results and practical IDs examples.
Explore dplyr's additional functions for tidyverse data wrangling, including add row and add column, moving row names to columns, lag and lead, and cumulative sum, min, max, and rank.
Develop skills in dplyr's additional functions by filtering flights for American Airlines, using lag and lead to compare successive origins and distances, computing running totals, and ranking by distance.
Master relational data with dplyr by using mutating and filtering joins, set operations, and bindings, then analyze airline data—carriers, planes, flights, and date-based distance trends.
Explore a tidyverse-based assignment walkthrough: join flights with airlines and planes, compute carrier–manufacturer counts and flight totals, then visualize results with bar plots and weather-influenced scatter plots.
Walk through building distance per date and date span tables, use left joins and carrier name, generate date grid, compute cumulative miles flown, and visualize with a ggplot line chart.
Explore tidy evaluation in the tidyverse, learn about the Arlanc package and its helpers, and practice writing functions that leverage dplyr and ggplot for practical plots.
Explore tidy evaluation, a non-standard, delayed evaluation framework in R implemented by rlang that captures user code and runs it on new data frames, enabling easier programming with tidyverse.
Explore basic elements in Erlang, including closures and expressions, and learn how environments, quotes, and unquote operations shape delayed evaluation.
Programming recipes demonstrate building a median calculator in R with dplyr and tidy evaluation, capturing data and variable via quoting, supporting group by and dynamic column naming.
The lecture demonstrates building tidy evaluation functions in R with dplyr, capturing variables via quo, unquoting with bang bang, and grouping with group_by to compute medians.
Learn to write dplyr functions for descriptive stats, including a summary function that computes mean, max, median, sd, and range, plus a count frequencies function and moving averages with ggplot.
Write generic histogram and scatter functions with ggplot2, mapping user-selected variables. Return ggplot2 objects you can extend with layers and a custom theme.
Master the evaluation framework and learn to write generic functions with dplyr and ggplot, then complete assignment eight, including a count_freq function and a diamonds bar plot and scatter plot.
Follow an assignment walkthrough to build reusable tidyverse functions for frequency counts, scatter plots, and diamond data visualization with ggplot themes, preparing for future covid-19 data exploration.
Explore a final project using tidyverse to clean and analyze US covid-19 data, including cases, deaths, vaccinations, population, GDP, and government response with data wrangling, visualization, and question-driven exploration.
Identify data sources for project, including Johns Hopkins Covid 19 cases, Our World in Data vaccinations, GDP and population data, and Oxford Covid 19 Government Response Tracker, plus multi-file imports.
Follow a step-by-step roadmap to import data from course sources into R, clean and merge into a master table, then perform exploratory data analysis with ggplot2 in tidyverse.
Explore data import in R using tidyverse, creating a new project, listing and loading daily covid-19 CSV files with map and read_csv, and cleaning names with janitor.
Import GDP data from Excel using rio, clean to keep nominal GDP and GDP per capita for 2020, then import population, covid response tracker, and vaccinations data for initial exploration.
Explore initial data inspection in R with tidyverse, checking missing values and time spans across covid-19 data sources. Learn to build functions for NA counting and visualize results.
Conduct an initial data inspection of covid data, check missing values and time spans, and focus on confirmed and deaths at national and state levels.
Create main table with one row per state and date, using a universal US states list and string similarity to align names before merging covid-19, GDP, population, and vaccination data.
Create a universal state list and a date-state master table by joining Covid 19 vaccination, GDP, population, and response sources, fix mismatches, and add region for a unified main dataset.
Advance data wrangling on the main table by converting population to millions, computing daily confirmed cases and deaths, and handling missing data, negatives, and vaccination start dates per state.
Perform exploratory data analysis to quantify infections and deaths by state and region. Create date-filtered maps colored by region showing absolute and relative counts over time.
Perform exploratory data analysis to visualize total infections and deaths over time. Create a plot function that outputs absolute and relative counts by region with subplots and export figures.
Perform exploratory data analysis in tidyverse to identify which state paid the highest price, using relative counts of confirmed cases and deaths on the latest date.
Explore how to smooth daily pandemic dynamics with a seven-day moving average using the zoo package for time series, and visualize confirmed daily cases and deaths by region.
Explore how state wealth and population relate to total percentage of confirmed cases and deaths by plotting GDP per capita and population against infection and death percentages, faceted by region.
Explore exploratory data analysis to assess whether vaccination helps reduce covid-19 daily cases and deaths by comparing seven-day averages alongside total vaccine doses across regions.
Learn to analyze state-level covid data by computing seven-day averages of daily cases, deaths, and vaccine doses, then visualize total counts and government response across states using tidyverse.
Review version checks for R, R Studio, and packages in this final video, and point to a GitHub repository with all scripts for the course.
Data Science skills are still one of the most in-demand skills on the job market today. Many people see only the fun part of data science, tasks like: "search for data insight", "reveal the hidden truth behind the data", "build predictive models", "apply machine learning algorithms", and so on. The reality, which is known to most data scientists, is, that when you deal with real data, the most time-consuming operations of any data science project are: "data importing", "data cleaning", "data wrangling", "data exploring" and so on. So it is necessary to have an adequate tool for addressing given data-related tasks. What if I say, there is a freely accessible tool, that falls into the provided description above!
R is one of the most in-demand programming languages when it comes to applied statistics, data science, data exploration, etc. If you combine R with R's collection of libraries called tidyverse, you get one of the deadliest tools, which was designed for data science-related tasks. All tidyverse libraries share a unique philosophy, grammar, and data types. Therefore libraries can be used side by side, and enable you to write efficient and more optimized R code, which will help you finish projects faster.
This course includes several chapters, each chapter introduces different aspects of data-related tasks, with the proper tidyverse tool to help you deal with a given task. Also, the course brings to the table theory related to the topic, and practical examples, which are covered in R. If you dive into the course, you will be engaged with many different data science challenges, here are just a few of them from the course:
Tidy data, how to clean your data with tidyverse?
Grammar of data wrangling.
How to wrangle data with dplyr and tidyr.
Create table-like objects called tibble.
Import and parse data with readr and other libraries.
Deal with strings in R using stringr.
Apply Regular Expressions concepts when dealing with strings.
Deal with categorical variables using forcats.
Grammar of Data Visualization.
Explore data and draw statistical plots using ggplot2.
Use concepts of functional programming, and map functions using purrr.
Efficiently deal with lists with the help of purrr.
Practical applications of relational data.
Use dplyr for relational data.
Tidy evaluation inside tidyverse.
Apply tidyverse tools for the final practical data science project.
Course includes:
over 25 hours of lecture videos,
R scripts and additional data (provided in the course material),
engagement with assignments at the end of each chapter,
assignments walkthrough videos (where you can check your results).
All being said this makes one of Udemy's most comprehensive courses for data science-related tasks using R and tidyverse.
Enroll today and become the master of R's tidyverse!!!