Multivariate Data Visualization with R

Course describes and demonstrates a creative approach for constructing and drawing grid-based multivariate graphs in R
4.3 (12 ratings) Instead of using a simple lifetime average, Udemy calculates a
course's star rating by considering a number of different factors
such as the number of ratings, the age of ratings, and the
likelihood of fraudulent ratings.
1,860 students enrolled
$19
$30
37% off
Take This Course
  • Lectures 32
  • Length 7 hours
  • Skill Level All Levels
  • Languages English
  • Includes Lifetime access
    30 day money back guarantee!
    Available on iOS and Android
    Certificate of Completion
Wishlisted Wishlist

How taking a course works

Discover

Find online courses made by experts from around the world.

Learn

Take your courses with you and learn anywhere, anytime.

Master

Learn and practice real-world skills and achieve your goals.

About This Course

Published 8/2015 English

Course Description

It is often both useful and revealing to create visualizations, plots and graphs of the multivariate data that is the subject of one's research project. Often, both pre-analysis and post-analysis visualizations can help one understand “what is going on in the data" in a way that looking at numerical summaries of fitted model estimates cannot. The lattice package in R is uniquely designed to graphically depict relationships in multivariate data sets.

This course describes and demonstrates this creative approach for constructing and drawing grid-based multivariate graphic plots and figures using R. Lattice graphics are characterized as multi-variable (3, 4, 5 or more variables) plots that use conditioning and paneling. Consequently, it is a popular approach for, and a good fit to visually present the results of multi-variable statistical model fitting. The appearance of most of the plots, graphs and figures are determined by panel functions, rather than by the high-level graphics function calls themselves. Further, the user of lattice graphics has extensive and comprehensive control over many more of the details and features of the visual plots, far greater control that is afforded by the base graphics approach in R. The method is based on trellis graphics which were popularized in the S language developed by Bell Labs.

What are the requirements?

  • Students will need to install R and RStudio (instructions are provided in the course materials).

What am I going to get from this course?

  • Graphically depict visual 2D, 3D, 4D (and so on) relationships that exist in multivariate data sets.
  • Understand how "trellis" graphic objects are different from other graphic objects in R.
  • Understand how to apply the techniques of conditioning and paneling to present multivariate data relationships.
  • Understand the nature of lattice panel functions and know how to create and modify them for brilliant multivariate graphics displays.
  • Have a powerful visual toolset to visually present the results of multi-variable statistical model fitting.

What is the target audience?

  • Anyone who uses R, or who wants to use R, for any sort of multivariate data analysis would benefit from taking this course.
  • The course is appropriate for students, scientists, or other quantitative-analysis professionals who want to display numerical information in plots and graphs.
  • To take advantage of the course, students will need to have a basic (introductory) level or ability to use R software. However, all of the graphic R scripts are provided with the course materials.

What you get with this course?

Not for you? No problem.
30 day money back guarantee.

Forever yours.
Lifetime access.

Learn on the go.
Desktop, iOS and Android.

Get rewarded.
Certificate of completion.

Curriculum

Section 1: Introduction to Lattice and to "Trellis" Graphics
Introduction to Course
Preview
01:16
16:24

The lattice package, written by Deepayan Sarkar, attempts to improve on base R graphics by providing better defaults and the ability to easily display multivariate relationships. In particular, the package supports the creation of trellis graphs - graphs that display a variable or the relationship between variables, conditioned on one or more other variables.

The typical format is

<em>graph_type</em>(<em>formula</em>, data=)

where graph_type is selected from the listed below. formula specifies the variable(s) to display and any conditioning variables . For example ~x|A means display numeric variable x for each level of factor A.y~x | A*B means display the relationship between numeric variables y and x separately for every combination of factor A and B levels. ~x means display numeric variable x alone.

13:18

A trellis object, as returned by high level lattice functions like xyplot, is a list with the "class" attribute set to "trellis". Many of the components of this list are simply the arguments to the high level function that produced the object. Among them are: as.table, layout, page, panel, prepanel,main, sub, par.strip.text, strip, skip, xlab ylab, par.settings, lattice.options and plot.args.

Dimension and Physical Layout
14:00
Scales and Axes
08:43
17:04
  1. In statistics, a univariate distribution is a probability distribution of only one random variable. This is in contrast to a multivariatedistribution, the probability distribution of a random vector (consisting of multiple random variables).
Visualizing Univariate Distributions (part 2)
14:56
14:14
  1. A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second data set. By a quantile, we mean the fraction (or percent) of points below the given value. That is, the 0.3 (or 30%) quantile is the point at which 30% percent of the data fall below and 70% fall above that value.
08:03

Box-and-whisker plots summarize the data using a few quantiles, and possibly some outliers. This summarizing can be important when the number of observations is large. When the number of observations per sample is small, it is often sufficient to simply plot the sample values side by side in a common scale. Such plots are known as strip plots, also referred to as univariate scatter plots. They are in fact very similar to the bivariate scatter plots.

Section 2: Multiway Tables and Scatter Plots
12:16

An important subset of statistical data comes in the form of tables. Tables usually record the frequency or proportion of observations that fall into a particular category or combination of categories. They could also encode some other summary measure such as a rate (of binary events) or mean (of a continuous variable). In R, tables are usually represented by arrays of one (vectors), two (matrices), or more dimensions. To distinguish them from other vectors and arrays, they often have class “table”. The R functions table() and xtabs() can be used to create tables from raw data.

Multipanel Dot Plots
11:09
13:37

A scatter plot graphs two variables directly against each other in a Cartesian coordinate system. It is a simple graphic in the sense that the data are directly encoded without being summarized in any way; often the aspects that the user needs to worry about most are graphical ones such as whether to join the points by a line, what colors to use, and so on. Depending on the purpose, scatter plots can also be enhanced in several ways. In this chapter, we go over some of the variants supported by panel.xyplot(), which is the default panel function for both xyplot() and splom() (under the alias panel.splom()).

Shingles and Advanced Indexing
06:34
More Scatter Plots (part 1)
16:08
More Scatter Plots (part 2)
10:22
13:39

Scatter-plot matrices, produced by splom(), are exactly what the name suggests; they are a matrix of pairwise scatter plots given two or more variables. Conditioning is possible, but it is more common to call splom() with a data frame as its first argument.

07:29

Like scatter-plot matrices, parallel coordinates plots are hypervariate in nature, that is, they show relationships between an arbitrary number of variables. Their design is related to univariate scatter plots; in fact, they are basically univariate scatter plots of all variables of interest stacked parallel to each other (vertically in the implementation in lattice), with values that correspond to the same observation linked by line segments.

Section 3: Trivariate, 3D, and Other Complex Displays
09:01

Trivariate displays encode three primary variables in a panel. There are four high-level functions in lattice that produce trivariate displays: cloud() creates three-dimensional scatter plots of unstructured trivariate data, whereas levelplot(), contourplot(), and wireframe() render surfaces or two dimensional tables evaluated on a systematic rectangular grid. Of these, cloud() and wireframe() are similar in that they both create two-dimensional projections of three-dimensional constructs, and they share several common arguments that control the details of the projection.

09:56

We begin with cloud(), which produces three-dimensional scatter plots. Most of the discussion in this section about projection and how to control it in cloud() applies to wireframe() as well.

3D Scatter Plots (part 2)
08:38
3D Panel Functions
17:08
Visualizing 3D Surfaces
Preview
13:52
More 3D Visualizations
16:50
13:39

The methods we used to plot regression surfaces using wireframe() can be easily adapted to mathematical surfaces.

Section 4: Finer Control Graphical Parameters and Other Settings
15:05

Graphical parameters are often critical in determining the effectiveness of a plot. Such parameters include obvious ones such as colors, symbols, line types, and fonts for the various elements of a graph, as well as more subtle ones such as the length of tick marks or the amount of space separating different components of the graph. The parameters used in lattice displays are highly customizable. Many of them can be controlled directly by specifying suitable arguments in a high-level function call. Most derive their default values from a system of common global settings that can also be modified by the user. The latter approach has two primary benefits: it allows good global defaults to be specified, and it provides a consistent “look and feel” to lattice graphics while letting the user retain ultimate control.

Graphical Parameters Continued
14:15
Plot Coordinates and Axis Annotation
13:19
Labels and Legends
Preview
14:49
Data Manipulation (part 1)
13:56
Data Manipulation (part 2)
15:27
Shingles and Related Utilities
14:57
Ordering Categorical Variables
14:59

Students Who Viewed This Course Also Viewed

  • Loading
  • Loading
  • Loading

Instructor Biography

Geoffrey Hubona, Ph.D., Professor of Information Systems

Dr. Geoffrey Hubona held full-time tenure-track, and tenured, assistant and associate professor faculty positions at 3 major state universities in the Eastern United States from 1993-2010. In these positions, he taught dozens of various statistics, business information systems, and computer science courses to undergraduate, master's and Ph.D. students. He earned a Ph.D. in Business Administration (Information Systems and Computer Science) from the University of South Florida (USF) in Tampa, FL (1993); an MA in Economics (1990), also from USF; an MBA in Finance (1979) from George Mason University in Fairfax, VA; and a BA in Psychology (1972) from the University of Virginia in Charlottesville, VA. He was a full-time assistant professor at the University of Maryland Baltimore County (1993-1996) in Catonsville, MD; a tenured associate professor in the department of Information Systems in the Business College at Virginia Commonwealth University (1996-2001) in Richmond, VA; and an associate professor in the CIS department of the Robinson College of Business at Georgia State University (2001-2010). He is the founder of the Georgia R School (2010-2014) and of R-Courseware (2014-Present), online educational organizations that teach research methods and quantitative analysis techniques. These research methods techniques include linear and non-linear modeling, multivariate methods, data mining, programming and simulation, and structural equation modeling and partial least squares (PLS) path modeling. Dr. Hubona is an expert of the analytical, open-source R software suite and of various PLS path modeling software packages, including SmartPLS. He has published dozens of research articles that explain and use these techniques for the analysis of data, and, with software co-development partner Dean Lim, has created a popular cloud-based PLS software application, PLS-GUI.

Ready to start learning?
Take This Course