Multivariate Data Visualization with R
4.2 (17 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
2,115 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Multivariate Data Visualization with R to your Wishlist.

Add to Wishlist

Multivariate Data Visualization with R

Course describes and demonstrates a creative approach for constructing and drawing grid-based multivariate graphs in R
4.2 (17 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
2,115 students enrolled
Last updated 8/2015
Current price: $10 Original price: $30 Discount: 67% off
5 hours left at this price!
30-Day Money-Back Guarantee
  • 7 hours on-demand video
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Graphically depict visual 2D, 3D, 4D (and so on) relationships that exist in multivariate data sets.
  • Understand how "trellis" graphic objects are different from other graphic objects in R.
  • Understand how to apply the techniques of conditioning and paneling to present multivariate data relationships.
  • Understand the nature of lattice panel functions and know how to create and modify them for brilliant multivariate graphics displays.
  • Have a powerful visual toolset to visually present the results of multi-variable statistical model fitting.
View Curriculum
  • Students will need to install R and RStudio (instructions are provided in the course materials).

It is often both useful and revealing to create visualizations, plots and graphs of the multivariate data that is the subject of one's research project. Often, both pre-analysis and post-analysis visualizations can help one understand “what is going on in the data" in a way that looking at numerical summaries of fitted model estimates cannot. The lattice package in R is uniquely designed to graphically depict relationships in multivariate data sets.

This course describes and demonstrates this creative approach for constructing and drawing grid-based multivariate graphic plots and figures using R. Lattice graphics are characterized as multi-variable (3, 4, 5 or more variables) plots that use conditioning and paneling. Consequently, it is a popular approach for, and a good fit to visually present the results of multi-variable statistical model fitting. The appearance of most of the plots, graphs and figures are determined by panel functions, rather than by the high-level graphics function calls themselves. Further, the user of lattice graphics has extensive and comprehensive control over many more of the details and features of the visual plots, far greater control that is afforded by the base graphics approach in R. The method is based on trellis graphics which were popularized in the S language developed by Bell Labs.

Who is the target audience?
  • Anyone who uses R, or who wants to use R, for any sort of multivariate data analysis would benefit from taking this course.
  • The course is appropriate for students, scientists, or other quantitative-analysis professionals who want to display numerical information in plots and graphs.
  • To take advantage of the course, students will need to have a basic (introductory) level or ability to use R software. However, all of the graphic R scripts are provided with the course materials.
Compare to Other R Courses
Curriculum For This Course
32 Lectures
Introduction to Lattice and to "Trellis" Graphics
9 Lectures 01:47:58

The lattice package, written by Deepayan Sarkar, attempts to improve on base R graphics by providing better defaults and the ability to easily display multivariate relationships. In particular, the package supports the creation of trellis graphs - graphs that display a variable or the relationship between variables, conditioned on one or more other variables.

The typical format is

<em>graph_type</em>(<em>formula</em>, data=)

where graph_type is selected from the listed below. formula specifies the variable(s) to display and any conditioning variables . For example ~x|A means display numeric variable x for each level of factor A.y~x | A*B means display the relationship between numeric variables y and x separately for every combination of factor A and B levels. ~x means display numeric variable x alone.

Introduction to Lattice

A trellis object, as returned by high level lattice functions like xyplot, is a list with the "class" attribute set to "trellis". Many of the components of this list are simply the arguments to the high level function that produced the object. Among them are: as.table, layout, page, panel, prepanel,main, sub, par.strip.text, strip, skip, xlab ylab, par.settings, lattice.options and plot.args.

The Trellis Object

Dimension and Physical Layout

Scales and Axes

  1. In statistics, a univariate distribution is a probability distribution of only one random variable. This is in contrast to a multivariatedistribution, the probability distribution of a random vector (consisting of multiple random variables).
Preview 17:04

Visualizing Univariate Distributions (part 2)

  1. A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second data set. By a quantile, we mean the fraction (or percent) of points below the given value. That is, the 0.3 (or 30%) quantile is the point at which 30% percent of the data fall below and 70% fall above that value.
Two-Sample QQ Plots

Box-and-whisker plots summarize the data using a few quantiles, and possibly some outliers. This summarizing can be important when the number of observations is large. When the number of observations per sample is small, it is often sufficient to simply plot the sample values side by side in a common scale. Such plots are known as strip plots, also referred to as univariate scatter plots. They are in fact very similar to the bivariate scatter plots.

Strip Plots
Multiway Tables and Scatter Plots
8 Lectures 01:31:14

An important subset of statistical data comes in the form of tables. Tables usually record the frequency or proportion of observations that fall into a particular category or combination of categories. They could also encode some other summary measure such as a rate (of binary events) or mean (of a continuous variable). In R, tables are usually represented by arrays of one (vectors), two (matrices), or more dimensions. To distinguish them from other vectors and arrays, they often have class “table”. The R functions table() and xtabs() can be used to create tables from raw data.

Multiway Tables

Multipanel Dot Plots

A scatter plot graphs two variables directly against each other in a Cartesian coordinate system. It is a simple graphic in the sense that the data are directly encoded without being summarized in any way; often the aspects that the user needs to worry about most are graphical ones such as whether to join the points by a line, what colors to use, and so on. Depending on the purpose, scatter plots can also be enhanced in several ways. In this chapter, we go over some of the variants supported by panel.xyplot(), which is the default panel function for both xyplot() and splom() (under the alias panel.splom()).

Scatter Plots and Extensions

Shingles and Advanced Indexing

More Scatter Plots (part 1)

More Scatter Plots (part 2)

Scatter-plot matrices, produced by splom(), are exactly what the name suggests; they are a matrix of pairwise scatter plots given two or more variables. Conditioning is possible, but it is more common to call splom() with a data frame as its first argument.

Preview 13:39

Like scatter-plot matrices, parallel coordinates plots are hypervariate in nature, that is, they show relationships between an arbitrary number of variables. Their design is related to univariate scatter plots; in fact, they are basically univariate scatter plots of all variables of interest stacked parallel to each other (vertically in the implementation in lattice), with values that correspond to the same observation linked by line segments.

Parallel Coordinates Plot
Trivariate, 3D, and Other Complex Displays
7 Lectures 01:29:04

Trivariate displays encode three primary variables in a panel. There are four high-level functions in lattice that produce trivariate displays: cloud() creates three-dimensional scatter plots of unstructured trivariate data, whereas levelplot(), contourplot(), and wireframe() render surfaces or two dimensional tables evaluated on a systematic rectangular grid. Of these, cloud() and wireframe() are similar in that they both create two-dimensional projections of three-dimensional constructs, and they share several common arguments that control the details of the projection.

Trivariate Displays

We begin with cloud(), which produces three-dimensional scatter plots. Most of the discussion in this section about projection and how to control it in cloud() applies to wireframe() as well.

3D Scatter Plots (part 1)

3D Scatter Plots (part 2)

3D Panel Functions

More 3D Visualizations

The methods we used to plot regression surfaces using wireframe() can be easily adapted to mathematical surfaces.

Visualizing Theoretical 3D Surfaces
Finer Control Graphical Parameters and Other Settings
8 Lectures 01:56:47

Graphical parameters are often critical in determining the effectiveness of a plot. Such parameters include obvious ones such as colors, symbols, line types, and fonts for the various elements of a graph, as well as more subtle ones such as the length of tick marks or the amount of space separating different components of the graph. The parameters used in lattice displays are highly customizable. Many of them can be controlled directly by specifying suitable arguments in a high-level function call. Most derive their default values from a system of common global settings that can also be modified by the user. The latter approach has two primary benefits: it allows good global defaults to be specified, and it provides a consistent “look and feel” to lattice graphics while letting the user retain ultimate control.

Graphical Parameters and Other Settings

Graphical Parameters Continued

Plot Coordinates and Axis Annotation

Data Manipulation (part 1)

Data Manipulation (part 2)

Shingles and Related Utilities

Ordering Categorical Variables
About the Instructor
Geoffrey Hubona, Ph.D.
4.0 Average rating
1,493 Reviews
12,685 Students
28 Courses
Associate Professor of Information Systems

Dr. Geoffrey Hubona held full-time tenure-track, and tenured, assistant and associate professor faculty positions at 3 major state universities in the Eastern United States from 1993-2010. Currently, he is a visiting associate professor of MIS at Texas A&M International University. In these positions, he taught dozens of various statistics, business information systems, and computer science courses to undergraduate, master's and Ph.D. students. He earned a Ph.D. in Business Administration (Information Systems and Computer Science) from the University of South Florida (USF) in Tampa, FL; an MA in Economics, also from USF; an MBA in Finance from George Mason University in Fairfax, VA; and a BA in Psychology from the University of Virginia in Charlottesville, VA. He is the founder of the Georgia R School (2010-2014) and of R-Courseware (2014-Present), online educational organizations that teach research methods and quantitative analysis techniques. These research methods techniques include linear and non-linear modeling, multivariate methods, data mining, programming and simulation, and structural equation modeling and partial least squares (PLS) path modeling.