Beginning Data Visualization with R and ggplot2
3.2 (2 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
14 students enrolled

Beginning Data Visualization with R and ggplot2

Learn to design and implement data from scratch
3.2 (2 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
14 students enrolled
Created by Packt Publishing
Last updated 3/2019
English [Auto]
Current price: $139.99 Original price: $199.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 4.5 hours on-demand video
  • 1 downloadable resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Set up the R environment, RStudio, and understand the structure of ggplot2
  • Use basic programming concepts of R such as loading packages, arithmetic functions, data structures, and flow control
  • Import data to R from various formats, such as CSV, Excel, and SQL
  • Clean data by handling missing values and standardizing fields
  • Perform univariate and bivariate analysis using ggplot2
  • Create statistical summary and advanced plots, such as histograms, scatter plots, box plots, and interaction plots
  • Apply data management techniques, such as factors, pivots, aggregation, merging, and dealing with missing values, on the example data sets
  • Distinguish variables and use best practices to visualize them
  • Build complex and aesthetic visualizations with ggplot2 analysis methods
  • Prior knowledge of R programming would be beneficial. It is assumed that you know basic high school math and statistics.

Data analysis is crucial to accurately predict the performance of an application. When data is presented to you in a graphical or pictorial format, you can analyze it more effectively. This Learning Path introduces you to the tools for working with data. To start with, you'll understand you how to set up R and RStudio, followed by exploring R packages, functions, data structures, control flow, and loops.

Once you have grasped the basics, you'll move on to studying data visualization and graphics. You'll learn how to build statistical and advanced plots using the powerful ggplot2 library. In addition to this, you'll discover data management concepts such as factoring, pivoting, aggregating, merging, and dealing with missing values. You'll discover what layers, scales, coordinates, and themes are, and study how you can use them to transform your data into aesthetical graphs. Next, you'll study simple plots such as histograms and advanced plots such as superimposing and density plots. You'll also get to grips with plotting trends, correlations, and statistical summaries.

By the end of this Learning Path, you'll become master in data visualization techniques using the powerful R libraries.

About the Author

Samik Sen is currently working with R on machine learning. He has done his PhD in Theoretical Physics. He has tutored classes for high performance computing postgraduates and lecturer at international conferences. He has experience of using Perl on data, producing plots with gnuplot for visualization and latex to produce reports. He, then, moved to finance/football and online education with videos.

Chris DallaVilla is the founder and CEO of VALID., an independent marketing consulting practice specializing in providing data-driven solutions that help chief marketing officers and their teams strengthen their planning and execution, and drive results. Chris has expertise in digital and social media marketing, as well as certifications in Agile, Google AdWords, and Google Analytics. He studied computer science at Harvard University, design technology at Massachusetts College of Art and Design, and advertising and marketing communications at the Questrom School of Business at Boston University.

Who this course is for:
  • If you are a developer looking forward to learn data visualization techniques, then this Learning Path is for you.
Course content
Expand all 54 lectures 04:38:15
+ R Programming Fundamentals
28 lectures 02:21:11

Let us begin the course and see the lessons and concepts that will be covered.

The GitHub link for this course is:

Preview 01:46

In this section, you will learn how to install and set up the environment. Let us install:

  • R

  • RStudio

Preview 02:24

Let us begin by looking at the basics of using R as a programming language and as a statistical analysis tool. Let us understand the lesson objectives and the lesson map.

Lesson Overview

Let us now get introduced to R and then RStudio. Let's learn the RStudio interface and learn to use and execute basic arithmetic in R Console. We can then learn to set up a new project in RStudio to use throughout the course and procedure to install packages in RStudio.

Using R, RStudio, and Installing Useful Packages

Now, let us begin first with an exploration of different variable types and then look at different data structures in R. We will also learn that all variables created in R will have a class and a type. Let us understand how to use different numeric objects, character objects, and date objects.

Variable Types and Data Structures: Variable Types

Let us now look into the uses of data structures and the four major types of data structures, its uses, and the ways of building and using them.

Variable Types and Data Structures: Data Structures

Let us look at the Basic Flow Control in R. It shows that Flow Control includes different kinds of loops which lets us learn to use each of the loops and teaches us to predict what each of the different types of loop codes would print.

Basic Flow Control

Let us now learn about Data import and export by looking at the different delimiter and functions to import and export data for different file types. We can then get to know about the built-in functions for data import and export. Then, let us learn about Synthetic data, downloading data from Github, importing .csv files, and importing and exporting .xlsx files.

Data Import and Export

Let us learn about the basic page for getting help with R using the web browser and look at the package documentation that can be used to solve queries with various functions in R. In addition to the thorough documentation built into R, let us learn about vignettes and using its related functions.

Getting Help with R

Let us understand the different places over the internet to find help with R and when a learner should use each of these communities/sites/blogs.

RStudio Community, Stack Overflow, and the Rest of the Web

This video summarizes your learning of this lesson.

Lesson Summary
Test Your Knowledge
14 questions

This video will show you the lessons objectives and the lesson map and cover the various uses of data visualization.

Preview 00:58

Let us get introduced to Base Plots, the plot() function, R Help documentation, dataset library, and learn to plot mtcars dataset.

Preview 05:05

Let us now learn about creating and plotting factor variables, Model objects, and linear model object.

Creating Base Plots Part II

Let us now look into Titles and Axis labels, adding them to Base plots, and changing the color of Base plots.

Creating Base Plots Part III

Let us get introduced to ggplot2 and its installation. We then learn about the difference between base plots and ggplot2, and thinking of ggplot2 in layered structure. Let us then learn about the syntax, local and global mapping in ggplot2 calls.

ggplot2: Introduction

Let us learn about when to use and how to create Histogram and then Bar Chart using ggplot2.

ggplot2: Histogram and Bar Chart

Let us now understand when to use and how to create scatterplots and then boxplots using ggplot2.

ggplot2: Scatterplots and boxplots

Let us now dig into some global and plot-specific aesthetics and using different bar chart aesthetic options. Then, we learn to utilize Facet Wrapping and Gridding to visualize data effectively.

ggplot2: Digging in aes(), and Facet Wrapping and Gridding

Let us now learn the concept of Boxplot + coord_flip(),utilizing different aesthetics for Scatterplots, and adding titles and axis labels to ggplot2 in different ways.

ggplot2: Boxplot + coord_flip() and Adding Titles and Axis labels to ggplot2

Let us now look at some of the interactive plots like Plotly and Shiny and how one can explore them in detail over the internet.

Interactive Plots

This video summarizes your learning of this lesson.

Lesson Summary
Test Your Knowledge
12 questions

In this lesson, we will address what a factor variable is and how to use one, how to summarize your data numerically, how to combine, merge, and split datasets, and how to split and combine strings. Let us see this in the form of lesson objectives and lesson map.

Preview 00:55

Let us now get introduced to factor variables and its characteristics, when and why should one use it, and how to create factor variables in R. Let us also learn to identify if something is already a factor, the different levels and ways to change the level of a factor variable, and finally creating an ordered factor variables.

Factor Variables

Let us now understand about data summarization and its types, the advantages and disadvantages of table in R, creating tables in R, and summarizing data with the Apply family.

Summarizing Data

Let us now look in to splitting and combining data and datasets using different row and column functions, and splitting and combining strings.

Splitting and Combining Datasets

Let us understand different ways of merging and joining data and the different types of merges of datasets and its argument used and finally seeing some merges and joins in R.

Merging and Joining Datasets

This video summarizes your learning of this lesson.

Lesson Summary
Test Your Knowledge
12 questions
+ Applied Data Visualization with R and ggplot2
26 lectures 02:17:04

Let's begin the course with the content coverage.

Preview 01:07

Before we dive deep into the concepts and practice exercises, let us first install R and RStudio to get started.

Installation and Setup

Let us begin with the first lesson and understand what we are going to cover in our learning journey.

Lesson Overview

ggplot2 is a visualization package in R. It was developed in 2005 and it uses the concept of the 'Grammar of Graphics' to build a plot in layers and scales. This is the syntax used for the different components (aesthetics) of a geometric object. It also involves the grammatical rules for creating a visualization. Let us learn more about it with the following topics:

  • Introduction to ggplot2

  • Loading and Exploring a Dataset Using R Functions

  • The Main Concepts of ggplot2

  • Types of Variables

  • Exploring Datasets

Introduction to ggplot2

The ggplot2 function qplot is similar to the basic plot () function from the R package. It can be used to build and combine a range of useful graphs; however, it does not have the same flexibility as the ggplot() function. Let us now create our first plot with qplot and R.

Making Your First Plot

The geometric objects in ggplot2 are visual structures that are used to visualize data. They can be lines, bars, points, and so on. Geometric objects are constructed from datasets. Before we construct some geometric objects, let's examine some datasets to understand the different kinds of variables. Here are the topics that we will cover now:

  • Geometric Objects

  • Analyzing Different Datasets

  • Histograms

  • Examples of Unimodal and Bimodal Distribution

  • Creating a Histogram Using qplot and ggplot

Geometric Objects

Bar charts are more general than histograms, and they can represent both discrete and continuous data. They can even be used to represent categorical variables. A bar chart uses a horizontal or vertical rectangular bar that levels off at an appropriate level. A bar chart can be used to represent various quantities, such as frequency counts and percentages. Let us learn this with the following exercises:

  • Create Bar Charts

  • Create a One-Dimensional Bar Chart

  • Create a Two-dimensional Bar Chart

Bar Charts

A boxPlot is a standard way of displaying the distribution of data based on a five-number summary: minimum, first quartile, median, third quartile, and maximum. Boxplots can represent how a continuous variable is distributed for different categories; one of the axes will be a categorical variable, while the other will be a continuous variable. Let us begin with the following exercises:

  • Analyze and Create BoxPlots

  • Create a BoxPlot for a Given Dataset

  • Create Scatter Plots


A line chart shows the relationship between two variables; it is similar to a scatter plot, but the points are connected by line segments. In this section, we will learn more about Line Charts in detail.

Line Charts

The Grammar of Graphics is the language used to describe various components of a graphic that represents data in visualization. Let us understand the following concepts:

  • The Grammar of Graphics

  • Rebinning

  • Analysis of Histograms

  • Using the Grammar of Graphics to change Boxplot Defaults

The Grammar of Graphics

Summarize your learning from this lesson.

Lesson Summary
Test Your Knowledge
10 questions

Let us begin with the second lesson and understand what we are going to cover in our learning journey.

Preview 01:30

The Grammar of Graphics is the language used to describe the various components of a graphic that represent data in a visualization. In this video, you will learn more about the Grammar of Graphics and will use it to make plots. Let us dive deep and learn about:

  • Layers

  • Use More Layers to Customize a Histogram

More on the Grammar of Graphics

Scales map values in a data space to values in an aesthetic space, whether the value is a color, shape, or size. Scales are used to change legends or axes, providing inverse mapping and enabling us to understand the data from the graphic itself. In the previous video, when we plotted the histogram, ggplot applied a default scale, in order to describe the x- and y-axes. Let us understand the following concepts:

  • Scales

  • Use Scales to Analyze a Dataset

  • Types of Coordinates

  • Understand Polar Coordinates

Scales and Coordinates

In data visualization, we sometimes have the need to compare different groups, looking at data alongside each other. One method for doing this is creating a subplot for each group. These kinds of plots are known as Trellis displays. In ggplot2, they're called facets. Facets divide the data by some discrete or categorical variable and display the same type of graph for each data subset. In this section, we will learn more about Facets in detail.


Aside from faceting, we can also produce a color differentiated plot. It can be advantageous to use a color differentiated plot when the shapes are very similar and there is some overlap. To see small differences, it is useful to use colors. Let us learn this with the following exercises:

  • Use Different Colors to Group Points by a Variable

  • Explore Themes and Changing the Appearance of Graphs

  • Use a Theme to Customize a Plot

Changing Styles and Colors

Let us now set individual themes globally.

Changing Styles and Colors Part 2

We can use geoms and statistical summaries to create a summarized plot. Let us understand the following concepts:

  • Geoms and Statistical Summaries

  • Use Grouping to Create a Summarized Plot

Geoms and Statistical Summaries

Summarize your learning from this lesson.

Lesson Summary
Test Your Knowledge
12 questions

Let us begin with the third lesson and understand what we are going to cover in our learning journey.

Preview 01:26

Two of the most common advanced plotting techniques are scatter plots and bubble charts. Scatter plots show the relationship between two variables. A bubble chart can include a third variable. Let us learn this with the following exercises:

  • Create a Bubble Chart

  • Use Density Plots

  • Superimpose Plots

Advanced Plotting Techniques

A time series is a sequence of data points that are recorded at specific times. Time series are often used in the finance, trading, and housing industries. They are also used by scientists for predicting earthquakes, weather, and so on. In this section, we will learn more about it in detail.

Time Series

Let us now draw and display information with maps.


Statistical summaries are useful for summarizing a group of points. You may want to see different quantities (such as the minimum, maximum, mean, median, or quantiles) for a time series plot or a line chart that includes multiple y values for a given x value. We will use the financial data from Facebook and the statistical summary tool to better understand the trends. Let us understand the following concepts:

  • Time Series Plot with Mean, Median, and Quantiles

  • Trends, Correlations, and Scatter Plots

  • Scatter Plot and Fitting a Linear Regression Model

Trends, Correlations, and Statistical Summaries

Correlation matrixes show the correlation coefficients between a relatively large numbers of continuous variables. However, while R offers a simple way to create such matrixes through the cor function, it does not offer a plotting method for the matrixes created by that function. In this section, we will learn more about correlation plot in detail.

Correlation Plot

Summarize your learning from this lesson.

Lesson Summary
Test Your Knowledge
5 questions