Multivariate Data Visualization with R
4.2 (17 ratings)
2,115 students enrolled
Wishlisted Wishlist

# Multivariate Data Visualization with R

Course describes and demonstrates a creative approach for constructing and drawing grid-based multivariate graphs in R
4.2 (17 ratings)
2,115 students enrolled
Last updated 8/2015
English
Current price: \$10 Original price: \$30 Discount: 67% off
5 hours left at this price!
30-Day Money-Back Guarantee
Includes:
• 7 hours on-demand video
• Access on mobile and TV
• Certificate of Completion
What Will I Learn?
• Graphically depict visual 2D, 3D, 4D (and so on) relationships that exist in multivariate data sets.
• Understand how "trellis" graphic objects are different from other graphic objects in R.
• Understand how to apply the techniques of conditioning and paneling to present multivariate data relationships.
• Understand the nature of lattice panel functions and know how to create and modify them for brilliant multivariate graphics displays.
• Have a powerful visual toolset to visually present the results of multi-variable statistical model fitting.
View Curriculum
Requirements
• Students will need to install R and RStudio (instructions are provided in the course materials).
Description

It is often both useful and revealing to create visualizations, plots and graphs of the multivariate data that is the subject of one's research project. Often, both pre-analysis and post-analysis visualizations can help one understand “what is going on in the data" in a way that looking at numerical summaries of fitted model estimates cannot. The lattice package in R is uniquely designed to graphically depict relationships in multivariate data sets.

This course describes and demonstrates this creative approach for constructing and drawing grid-based multivariate graphic plots and figures using R. Lattice graphics are characterized as multi-variable (3, 4, 5 or more variables) plots that use conditioning and paneling. Consequently, it is a popular approach for, and a good fit to visually present the results of multi-variable statistical model fitting. The appearance of most of the plots, graphs and figures are determined by panel functions, rather than by the high-level graphics function calls themselves. Further, the user of lattice graphics has extensive and comprehensive control over many more of the details and features of the visual plots, far greater control that is afforded by the base graphics approach in R. The method is based on trellis graphics which were popularized in the S language developed by Bell Labs.

Who is the target audience?
• Anyone who uses R, or who wants to use R, for any sort of multivariate data analysis would benefit from taking this course.
• The course is appropriate for students, scientists, or other quantitative-analysis professionals who want to display numerical information in plots and graphs.
• To take advantage of the course, students will need to have a basic (introductory) level or ability to use R software. However, all of the graphic R scripts are provided with the course materials.
Compare to Other R Courses
Curriculum For This Course
32 Lectures
06:45:03
+
Introduction to Lattice and to "Trellis" Graphics
9 Lectures 01:47:58
Preview 01:16

The lattice package, written by Deepayan Sarkar, attempts to improve on base R graphics by providing better defaults and the ability to easily display multivariate relationships. In particular, the package supports the creation of trellis graphs - graphs that display a variable or the relationship between variables, conditioned on one or more other variables.

The typical format is

`<em>graph_type</em>(<em>formula</em>, data=)`

where graph_type is selected from the listed below. formula specifies the variable(s) to display and any conditioning variables . For example ~x|A means display numeric variable x for each level of factor A.y~x | A*B means display the relationship between numeric variables y and x separately for every combination of factor A and B levels. ~x means display numeric variable x alone.

Introduction to Lattice
16:24

A trellis object, as returned by high level lattice functions like `xyplot`, is a list with the `"class"` attribute set to `"trellis"`. Many of the components of this list are simply the arguments to the high level function that produced the object. Among them are: `as.table`, `layout`, `page`, `panel`, `prepanel`,`main`, `sub`, `par.strip.text`, `strip`, `skip`, `xlab` `ylab`, `par.settings`, `lattice.options` and `plot.args`.

The Trellis Object
13:18

Dimension and Physical Layout
14:00

Scales and Axes
08:43

1. In statistics, a univariate distribution is a probability distribution of only one random variable. This is in contrast to a multivariatedistribution, the probability distribution of a random vector (consisting of multiple random variables).
Preview 17:04

Visualizing Univariate Distributions (part 2)
14:56

1. A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second data set. By a quantile, we mean the fraction (or percent) of points below the given value. That is, the 0.3 (or 30%) quantile is the point at which 30% percent of the data fall below and 70% fall above that value.
Two-Sample QQ Plots
14:14

Box-and-whisker plots summarize the data using a few quantiles, and possibly some outliers. This summarizing can be important when the number of observations is large. When the number of observations per sample is small, it is often sufficient to simply plot the sample values side by side in a common scale. Such plots are known as strip plots, also referred to as univariate scatter plots. They are in fact very similar to the bivariate scatter plots.

Strip Plots
08:03
+
Multiway Tables and Scatter Plots
8 Lectures 01:31:14

An important subset of statistical data comes in the form of tables. Tables usually record the frequency or proportion of observations that fall into a particular category or combination of categories. They could also encode some other summary measure such as a rate (of binary events) or mean (of a continuous variable). In R, tables are usually represented by arrays of one (vectors), two (matrices), or more dimensions. To distinguish them from other vectors and arrays, they often have class “table”. The R functions table() and xtabs() can be used to create tables from raw data.

Multiway Tables
12:16

Multipanel Dot Plots
11:09

A scatter plot graphs two variables directly against each other in a Cartesian coordinate system. It is a simple graphic in the sense that the data are directly encoded without being summarized in any way; often the aspects that the user needs to worry about most are graphical ones such as whether to join the points by a line, what colors to use, and so on. Depending on the purpose, scatter plots can also be enhanced in several ways. In this chapter, we go over some of the variants supported by panel.xyplot(), which is the default panel function for both xyplot() and splom() (under the alias panel.splom()).

Scatter Plots and Extensions
13:37

06:34

More Scatter Plots (part 1)
16:08

More Scatter Plots (part 2)
10:22

Scatter-plot matrices, produced by splom(), are exactly what the name suggests; they are a matrix of pairwise scatter plots given two or more variables. Conditioning is possible, but it is more common to call splom() with a data frame as its first argument.

Preview 13:39

Like scatter-plot matrices, parallel coordinates plots are hypervariate in nature, that is, they show relationships between an arbitrary number of variables. Their design is related to univariate scatter plots; in fact, they are basically univariate scatter plots of all variables of interest stacked parallel to each other (vertically in the implementation in lattice), with values that correspond to the same observation linked by line segments.

Parallel Coordinates Plot
07:29
+
Trivariate, 3D, and Other Complex Displays
7 Lectures 01:29:04

Trivariate displays encode three primary variables in a panel. There are four high-level functions in lattice that produce trivariate displays: cloud() creates three-dimensional scatter plots of unstructured trivariate data, whereas levelplot(), contourplot(), and wireframe() render surfaces or two dimensional tables evaluated on a systematic rectangular grid. Of these, cloud() and wireframe() are similar in that they both create two-dimensional projections of three-dimensional constructs, and they share several common arguments that control the details of the projection.

Trivariate Displays
09:01

We begin with cloud(), which produces three-dimensional scatter plots. Most of the discussion in this section about projection and how to control it in cloud() applies to wireframe() as well.

3D Scatter Plots (part 1)
09:56

3D Scatter Plots (part 2)
08:38

3D Panel Functions
17:08

Preview 13:52

More 3D Visualizations
16:50

The methods we used to plot regression surfaces using wireframe() can be easily adapted to mathematical surfaces.

Visualizing Theoretical 3D Surfaces
13:39
+
Finer Control Graphical Parameters and Other Settings
8 Lectures 01:56:47

Graphical parameters are often critical in determining the effectiveness of a plot. Such parameters include obvious ones such as colors, symbols, line types, and fonts for the various elements of a graph, as well as more subtle ones such as the length of tick marks or the amount of space separating different components of the graph. The parameters used in lattice displays are highly customizable. Many of them can be controlled directly by specifying suitable arguments in a high-level function call. Most derive their default values from a system of common global settings that can also be modified by the user. The latter approach has two primary benefits: it allows good global defaults to be specified, and it provides a consistent “look and feel” to lattice graphics while letting the user retain ultimate control.

Graphical Parameters and Other Settings
15:05

Graphical Parameters Continued
14:15

Plot Coordinates and Axis Annotation
13:19

Preview 14:49

Data Manipulation (part 1)
13:56

Data Manipulation (part 2)
15:27

Shingles and Related Utilities
14:57

Ordering Categorical Variables
14:59