Beginning Data Visualization with R and ggplot2
- 4.5 hours on-demand video
- 1 downloadable resource
- Full lifetime access
- Access on mobile and TV
- Certificate of Completion
Get your team access to 4,000+ top Udemy courses anytime, anywhere.Try Udemy for Business
- Set up the R environment, RStudio, and understand the structure of ggplot2
- Use basic programming concepts of R such as loading packages, arithmetic functions, data structures, and flow control
- Import data to R from various formats, such as CSV, Excel, and SQL
- Clean data by handling missing values and standardizing fields
- Perform univariate and bivariate analysis using ggplot2
- Create statistical summary and advanced plots, such as histograms, scatter plots, box plots, and interaction plots
- Apply data management techniques, such as factors, pivots, aggregation, merging, and dealing with missing values, on the example data sets
- Distinguish variables and use best practices to visualize them
- Build complex and aesthetic visualizations with ggplot2 analysis methods
- Prior knowledge of R programming would be beneficial. It is assumed that you know basic high school math and statistics.
Data analysis is crucial to accurately predict the performance of an application. When data is presented to you in a graphical or pictorial format, you can analyze it more effectively. This Learning Path introduces you to the tools for working with data. To start with, you'll understand you how to set up R and RStudio, followed by exploring R packages, functions, data structures, control flow, and loops.
Once you have grasped the basics, you'll move on to studying data visualization and graphics. You'll learn how to build statistical and advanced plots using the powerful ggplot2 library. In addition to this, you'll discover data management concepts such as factoring, pivoting, aggregating, merging, and dealing with missing values. You'll discover what layers, scales, coordinates, and themes are, and study how you can use them to transform your data into aesthetical graphs. Next, you'll study simple plots such as histograms and advanced plots such as superimposing and density plots. You'll also get to grips with plotting trends, correlations, and statistical summaries.
By the end of this Learning Path, you'll become master in data visualization techniques using the powerful R libraries.
About the Author
Samik Sen is currently working with R on machine learning. He has done his PhD in Theoretical Physics. He has tutored classes for high performance computing postgraduates and lecturer at international conferences. He has experience of using Perl on data, producing plots with gnuplot for visualization and latex to produce reports. He, then, moved to finance/football and online education with videos.
Chris DallaVilla is the founder and CEO of VALID., an independent marketing consulting practice specializing in providing data-driven solutions that help chief marketing officers and their teams strengthen their planning and execution, and drive results. Chris has expertise in digital and social media marketing, as well as certifications in Agile, Google AdWords, and Google Analytics. He studied computer science at Harvard University, design technology at Massachusetts College of Art and Design, and advertising and marketing communications at the Questrom School of Business at Boston University.
- If you are a developer looking forward to learn data visualization techniques, then this Learning Path is for you.
Let us begin the course and see the lessons and concepts that will be covered.
The GitHub link for this course is: https://github.com/TrainingByPackt/R-Programming-Fundamentals-eLearning
In this section, you will learn how to install and set up the environment. Let us install:
Let us now get introduced to R and then RStudio. Let's learn the RStudio interface and learn to use and execute basic arithmetic in R Console. We can then learn to set up a new project in RStudio to use throughout the course and procedure to install packages in RStudio.
Now, let us begin first with an exploration of different variable types and then look at different data structures in R. We will also learn that all variables created in R will have a class and a type. Let us understand how to use different numeric objects, character objects, and date objects.
Let us now learn about Data import and export by looking at the different delimiter and functions to import and export data for different file types. We can then get to know about the built-in functions for data import and export. Then, let us learn about Synthetic data, downloading data from Github, importing .csv files, and importing and exporting .xlsx files.
Let us learn about the basic page for getting help with R using the web browser and look at the package documentation that can be used to solve queries with various functions in R. In addition to the thorough documentation built into R, let us learn about vignettes and using its related functions.
This video will show you the lessons objectives and the lesson map and cover the various uses of data visualization.
Let us get introduced to Base Plots, the plot() function, R Help documentation, dataset library, and learn to plot mtcars dataset.
In this lesson, we will address what a factor variable is and how to use one, how to summarize your data numerically, how to combine, merge, and split datasets, and how to split and combine strings. Let us see this in the form of lesson objectives and lesson map.
Let us now get introduced to factor variables and its characteristics, when and why should one use it, and how to create factor variables in R. Let us also learn to identify if something is already a factor, the different levels and ways to change the level of a factor variable, and finally creating an ordered factor variables.
ggplot2 is a visualization package in R. It was developed in 2005 and it uses the concept of the 'Grammar of Graphics' to build a plot in layers and scales. This is the syntax used for the different components (aesthetics) of a geometric object. It also involves the grammatical rules for creating a visualization. Let us learn more about it with the following topics:
Introduction to ggplot2
Loading and Exploring a Dataset Using R Functions
The Main Concepts of ggplot2
Types of Variables
The geometric objects in ggplot2 are visual structures that are used to visualize data. They can be lines, bars, points, and so on. Geometric objects are constructed from datasets. Before we construct some geometric objects, let's examine some datasets to understand the different kinds of variables. Here are the topics that we will cover now:
Analyzing Different Datasets
Examples of Unimodal and Bimodal Distribution
Creating a Histogram Using qplot and ggplot
Bar charts are more general than histograms, and they can represent both discrete and continuous data. They can even be used to represent categorical variables. A bar chart uses a horizontal or vertical rectangular bar that levels off at an appropriate level. A bar chart can be used to represent various quantities, such as frequency counts and percentages. Let us learn this with the following exercises:
Create Bar Charts
Create a One-Dimensional Bar Chart
Create a Two-dimensional Bar Chart
A boxPlot is a standard way of displaying the distribution of data based on a five-number summary: minimum, first quartile, median, third quartile, and maximum. Boxplots can represent how a continuous variable is distributed for different categories; one of the axes will be a categorical variable, while the other will be a continuous variable. Let us begin with the following exercises:
Analyze and Create BoxPlots
Create a BoxPlot for a Given Dataset
Create Scatter Plots
The Grammar of Graphics is the language used to describe various components of a graphic that represents data in visualization. Let us understand the following concepts:
The Grammar of Graphics
Analysis of Histograms
Using the Grammar of Graphics to change Boxplot Defaults
Let us begin with the second lesson and understand what we are going to cover in our learning journey.
The Grammar of Graphics is the language used to describe the various components of a graphic that represent data in a visualization. In this video, you will learn more about the Grammar of Graphics and will use it to make plots. Let us dive deep and learn about:
Use More Layers to Customize a Histogram
Scales map values in a data space to values in an aesthetic space, whether the value is a color, shape, or size. Scales are used to change legends or axes, providing inverse mapping and enabling us to understand the data from the graphic itself. In the previous video, when we plotted the histogram, ggplot applied a default scale, in order to describe the x- and y-axes. Let us understand the following concepts:
Use Scales to Analyze a Dataset
Types of Coordinates
Understand Polar Coordinates
In data visualization, we sometimes have the need to compare different groups, looking at data alongside each other. One method for doing this is creating a subplot for each group. These kinds of plots are known as Trellis displays. In ggplot2, they're called facets. Facets divide the data by some discrete or categorical variable and display the same type of graph for each data subset. In this section, we will learn more about Facets in detail.
Aside from faceting, we can also produce a color differentiated plot. It can be advantageous to use a color differentiated plot when the shapes are very similar and there is some overlap. To see small differences, it is useful to use colors. Let us learn this with the following exercises:
Use Different Colors to Group Points by a Variable
Explore Themes and Changing the Appearance of Graphs
Use a Theme to Customize a Plot
Let us begin with the third lesson and understand what we are going to cover in our learning journey.
Two of the most common advanced plotting techniques are scatter plots and bubble charts. Scatter plots show the relationship between two variables. A bubble chart can include a third variable. Let us learn this with the following exercises:
Create a Bubble Chart
Use Density Plots
A time series is a sequence of data points that are recorded at specific times. Time series are often used in the finance, trading, and housing industries. They are also used by scientists for predicting earthquakes, weather, and so on. In this section, we will learn more about it in detail.
Statistical summaries are useful for summarizing a group of points. You may want to see different quantities (such as the minimum, maximum, mean, median, or quantiles) for a time series plot or a line chart that includes multiple y values for a given x value. We will use the financial data from Facebook and the statistical summary tool to better understand the trends. Let us understand the following concepts:
Time Series Plot with Mean, Median, and Quantiles
Trends, Correlations, and Scatter Plots
Scatter Plot and Fitting a Linear Regression Model
Correlation matrixes show the correlation coefficients between a relatively large numbers of continuous variables. However, while R offers a simple way to create such matrixes through the cor function, it does not offer a plotting method for the matrixes created by that function. In this section, we will learn more about correlation plot in detail.