Applied Multivariate Analysis with R

2,226 students enrolled

Please confirm that you want to add **Applied Multivariate Analysis with R** to your Wishlist.

Learn to use R software to conduct PCAs, MDSs, cluster analyses, EFAs and to estimate SEM models.

2,226 students enrolled

Current price: $12
Original price: $50
Discount:
76% off

30-Day Money-Back Guarantee

- 12 hours on-demand video
- 2 Supplemental Resources
- Full lifetime access
- Access on mobile and TV

- Certificate of Completion

Get your team access to Udemy's top 2,000 courses anytime, anywhere.

Try Udemy for Business
What Will I Learn?

- Conceptualize and apply multivariate skills and "hands-on" techniques using R software in analyzing real data.
- Create novel and stunning 2D and 3D multivariate data visualizations with R.
- Set up and estimate a Principal Components Analysis (PCA).
- Formulate and estimate a Multidimensional Scaling (MDS) problem.
- Group similar (or dissimilar) data with Cluster Analysis techniques.
- Estimate and interpret an Exploratory Factor Analysis (EFA).
- Specify and estimate a Structural Equation Model (SEM) using RAM notation in R.
- Be knowledgeable about SEM simulation capabilities from the R SIMSEM package.

Requirements

- No specific knowledge or skills are required.
- Students will need to install the popular no-cost R Console and RStudio software (instructions provided).
- However, it is helpful if students have some interest and aptitude in quantitative or statistical analysis.

Description

Applied Multivariate Analysis (MVA) with R is a practical, conceptual and applied "hands-on" course that teaches students how to perform various specific MVA tasks using real data sets and R software. It is an excellent and practical background course for anyone engaged with educational or professional tasks and responsibilities in the fields of data mining or predictive analytics, statistical or quantitative modeling (including linear, GLM and/or non-linear modeling, covariance-based Structural Equation Modeling (SEM) specification and estimation, and/or variance-based PLS Path Model specification and estimation. Students learn all about the nature of multivariate data and multivariate analysis. Students specifically learn how to create and estimate: covariance and correlation matrices; Principal Components Analyses (PCA); Multidimensional Scaling (MDS); Cluster Analysis; Exploratory Factor Analyses (EFA); and SEM model estimation. The course also teaches how to create dozens of different dazzling 2D and 3D multivariate data visualizations using R software. All software, R scripts, datasets and slides used in all lectures are provided in the course materials. The course is structured as a series of seven sections, each addressing a specific MVA topic and each section culminating with one or more "hands-on" exercises for the students to complete before proceeding to reinforce learning the presented MVA concepts and skills. The course is an excellent vehicle to acquire "real-world" predictive analytics skills that are in high demand today in the workplace. The course is also a fertile source of relevant skills and knowledge for graduate students and faculty who are required to analyze and interpret research data.

Who is the target audience?

- Anyone interested in using multivariate analysis technques as a basis for data mining, statistical modeling, and structural equation modeling (SEM) estimation.
- Practicing quantitative analysis professionals including college and university faculty seeking to learn new multivariate data analysis skills.
- Undergraduate students looking for jobs in predictive or business analytics fields.
- Graduate students wishing to learn more applied data analysis techniques and approaches.

Compare to Other Multivariate Analysis Courses

Curriculum For This Course

75 Lectures

12:13:21
+
–

Introduction to Multivariate Data and Analysis
12 Lectures
02:01:21

This video presents an overview of the Applied Multivariate Analysis (MVA) course.

Preview
11:40

The materials used in the video lectures for Section 1 Introduction to Multivariate Data and Analysis are briefly explained and then provided as a .zip file download after the short video is presented.

Materials for Section 1 Introduction to MV Data and Analysis

02:25

**Multivariate analysis** (**MVA**) is based on the statistical principle of multivariate statistics, which involves the observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest. Some of the applications include:

• To reduce a large number of variables to a smaller number of factors for data modeling

• To validate a scale or index by demonstrating that its constituent items load on the same factor, and to drop proposed scale items which cross-load on more than one factor.

• To select a subset of variables from a larger set, based on which original variables have the highest correlations with some other factors.

• To create a set of factors to be treated as uncorrelated variables as one approach to handling multi-collinearity in such procedures as multiple regression

In this "hands-on" course on applied multivariate analysis, we focus on how to actually use and conduct MVA analyses, using dozens of real data sets and R software. We examine the techniques and examples of principal components analysis, multidimensional scaling, cluster analysis, exploratory factor analysis, and an introduction to structural equation modeling.

Preview
14:15

Missing data is a huge problem in analyzing data sets because many statistical and mathematical functions fail when any individual data observations have even one missing data element. We explain and demonstrate why this is a problem using a 'body measures' dataset that we construct in R, and we show some "quick fixes" to getting around this problem of missing data in multivariate analysis.

Missing Values and the Measure Dataset

08:20

We create several multivariate data sets using R software. We use these data sets and others in the rest of the course.

Other Multivariate Datasets

10:11

In probability theory and statistics, a **covariance matrix** (also known as **dispersion matrix** or **variance–covariance matrix**) is a matrix whose element in the *i*, *j* position is the covariance between the *i* ^{th} and *j* ^{th} elements of a random vector (that is, of a vector of random variables). Each element of the vector is a scalar random variable, either with a finite number of observed empirical values or with a finite or infinite number of potential values specified by a theoretical joint probability distribution of all the random variables.

The **correlation matrix** of *n* random variables *X*_{1}, ..., *X*_{n} is the *n* × *n* matrix whose *i*,*j* entry is corr(*X*_{i}, *X*_{j}). If the measures of correlation used are product-moment coefficients, the correlation matrix is the same as the covariance matrix of the standardized random variables *X*_{i} / σ (*X*_{i}) for *i* = 1, ..., *n*. This applies to both the matrix of population correlations (in which case "σ" is the population standard deviation), and to the matrix of sample correlations (in which case "σ" denotes the sample standard deviation). Consequently, each is necessarily a positive-semidefinite matrix.

The correlation matrix is symmetric because the correlation between *X*_{i} and *X*_{j} is the same as the correlation between *X*_{j} and *X*_{i}.

Preview
11:36

We continue our discussion of creating, estimating and using both covariance and correlation matrices in multivariate analysis using R software. We also introduce the concept of "distance" for finding similarities / differences among sets of variables.

Covariance, Correlation and Distance (part 2)

10:21

We continue our discussion of creating, estimating and using both covariance and correlation matrices in multivariate analysis using R software. We also introduce the concept of "distance" for finding similarities / differences among sets of variables.

Covariance, Correlation and Distance (part 3)

10:12

We describe, create (with simulation), demonstrate and visualize a multivariate normal (MVN) density function using R. In probability theory and statistics, the **multivariate normal distribution** or **multivariate Gaussian distribution**, is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One possible definition is that a random vector is said to be *k*-variate normally distributed if every linear combination of its *k* components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

The Multivariate Normal Density Function

11:28

We demonstrate several R software graphical approaches to test for univariate and multivariate normality.

Setting Up Normality Plots

10:03

We continue our illustrative cases and examples of creating normality plots in R software.

Drawing Normality Plots

13:41

This video lecture explains the three covariance, correlation and normality exercises for the first section of the applied MVA course.

Covariance, Correlation and Normality Exercises

07:09

+
–

Visualizing Multivariate Data
13 Lectures
02:26:57

The materials used in the video lectures for Section 2 Visualizing Multivariate Data are briefly explained and then provided as a .zip file download after the short video is presented.

Materials and Exercises for Visualizing Multivariate Data Section

02:46

Covariance and Correlation Matrices with Missing Data (part 2)

10:45

Univariate and Multivariate QQPlots of Pottery Data

09:38

Converting Covariance to Correlation Matrices

15:20

Plots for Marginal Distributions

15:35

Outlier Identification

16:29

Chi, Bubble, and other Glyph Plots

14:12

Scatterplot Matrix

07:56

Kernel Density Estimators

13:41

3-Dimensional and Trellis (Lattice Package) Graphics

15:26

More Trellis (Lattice Package) Graphics

14:04

Bivariate Boxplot and ChiPlot Visualizations Exercises

02:16

+
–

Principal Components Analysis (PCA)
12 Lectures
01:40:14

The materials used in the video lectures for Section 3 Principal Components Analysis (PCS) are briefly explained and then provided as a .zip file download after the short video is presented.

Materials for Principal Components Analysis (PCA) Section

00:44

Bivariate Boxplot Visualization Exercise Solution

14:48

ChiPlot Visualization Exercise Solution

03:40

PCA Basics with R: Blood Data (part 1)

09:18

PCA Basics with R: Blood Data (part 2)

10:51

PCA with Head Size Data (part 1)

08:01

PCA with Head Size Data (part 2)

09:31

PCA with Heptathlon Data (part 1)

07:40

PCA with Heptathlon Data (part 2)

10:03

PCA with Heptathlon Data (part 3)

13:04

PCA Criminal Convictions Exercise

01:21

+
–

Multidimensional Scaling (MDS)
9 Lectures
01:36:39

The materials used in the video lectures for Section 4 Multidimensional Scaling (MDS) are briefly explained and then provided as a .zip file download after the short video is presented.

Materials for Multidimensional Scaling Section

00:56

PCA Criminal Convictions Exercise Solution

14:29

Classical Multidimensional Scaling (part 1)

14:50

Classical Multidimensional Scaling (part 2)

08:47

Classical Multidimensional Scaling: Skulls Data

17:46

Non-Metric Multidimensional Scaling Example: Voting Behavior

14:24

Non-Metric Multidimensional Scaling Example: WW II Leaders

09:08

Multidimensional Scaling Exercise: Water Voles

02:48

+
–

Cluster Analysis
14 Lectures
02:23:40

The materials used in the video lectures for Section 5 Cluster Analysis are briefly explained and then provided as a .zip file download after the short video is presented.

Materials for Cluster Analysis Section

01:13

MDS Water Voles Exercise Solution

13:55

Hierarchical Clustering Distance Techniques

10:35

Hierarchical Clustering of Measures Data

12:40

Hierarchical Clustering of Fighter Jets

10:15

K-Means Clustering of Crime Data (part 1)

13:06

K-Means Clustering of Crime Data (part 2)

06:36

Clustering of Romano-British Pottery Data

14:50

K-Means Classifying of Exoplanets

13:20

Model-Based Clustering of Exoplanets

12:34

Finite Mixture Model-Based Analysis

13:04

Cluster Analysis Neighborhood and Stripes Plots

10:07

K-Means Cluster Analysis Crime Data Exercise

00:35

+
–

Exploratory Factor Analysis (EFA)
8 Lectures
01:14:19

The materials used in the video lectures for Section 6 Exploratory Factor Analysis are briefly explained and then provided as a .zip file download after the short video is presented.

Materials for Exploratory Factor Analysis (EFA) Section

00:40

The solution to the K-Means exercise using the crime data is explained.

K-Means Crime Data Exercise Solution

09:20

In multivariate statistics, **exploratory factor analysis** (EFA) is a statistical method used to uncover the underlying structure of a relatively large set of variables. EFA is a technique within factor analysis whose overarching goal is to identify the underlying relationships between measured variables. It is commonly used by researchers when developing a scale (a *scale* is a collection of questions used to measure a particular research topic) and serves to identify a set of latent constructs underlying a battery of measured variables. It should be used when the researcher has no a priori hypothesis about factors or patterns of measured variables. *Measured variables* are any one of several attributes of people that may be observed and measured. An example of a measured variable would be the physical height of a human being. Researchers must carefully consider the number of measured variables to include in the analysis. EFA procedures are more accurate when each factor is represented by multiple measured variables in the analysis.

Preview
14:38

The factanal() function in R performs maximum-likelihood factor analysis on a covariance matrix or data matrix.

The factanal() Function Explained

07:34

Is an example of estimating an EFA using R software with the life data provided in the materials.

EFA Life Data Example

14:47

Is an example of estimating an EFA using R software with the drug use data provided in the materials.

EFA Drug Use Data Example

16:20

Both **exploratory factor analysis** (EFA) and **confirmatory factor analysis** (CFA) are employed to understand shared variance of measured variables that is believed to be attributable to a factor or latent construct. Despite this similarity, however, EFA and CFA are conceptually and statistically distinct analyses.

The goal of EFA is to identify factors based on data and to maximize the amount of variance explained. The researcher is not required to have any specific hypotheses about how many factors will emerge, and what items or variables these factors will comprise. If these hypotheses exist, they are not incorporated into and do not affect the results of the statistical analyses. By contrast, CFA evaluates *a priori* hypotheses and is largely driven by theory. CFA analyses require the researcher to hypothesize, in advance, the number of factors, whether or not these factors are correlated, and which items/measures load onto and reflect which factors. As such, in contrast to exploratory factor analysis, where all loadings are free to vary, CFA allows for the explicit constraint of certain loadings to be zero.

EFA is sometimes reported in research when CFA would be a better statistical approach. It has been argued that CFA can be restrictive and inappropriate when used in an exploratory fashion. However, the idea that CFA is solely a “confirmatory” analysis may sometimes be misleading, as modification indices used in CFA are somewhat exploratory in nature. Modification indices show the improvement in model fit if a particular coefficient were to become unconstrained. Likewise, EFA and CFA do not have to be mutually exclusive analyses; EFA has been argued to be a reasonable follow up to a poor-fitting CFA model.^{}

Comparing EFA with Confirmatory Factor Analysis (CFA)

08:17

The correlation matrix given below represent grading scores of 220 boys in six school subjects:

(1) French; (2) English; (3) History; (4) Arithmetic; (5) Algebra and (6) Geometry.

Find the two-factor solution from a maximum likelihood factor analysis. Interpret the factor loadings. Then plot these derived loadings and interpret again. Was it easier to interpret the factors by looking at the visualization? Finally, find an non-orthogonal rotation that allows easier interpretation of the results looking at the factor loadings directly, without the "visual utility" that is afforded by plotting the two-factor solution first.

# French 1.00

# English 0.44 1.00

# History 0.41 0.35 1.00

# Arithmetic 0.29 0.35 0.16 1.00

# Algebra 0.33 0.32 0.19 0.59 1.00

# Geometry 0.25 0.33 0.18 0.47 0.46 1.00

EFA Exercise

02:43

+
–

Introduction to Structural Equation Modeling (SEM), QGraph, and SIMSEM
7 Lectures
50:11

Structural equation modeling (SEM) is a methodology for representing, estimating, and testing a network of relationships between variables (measured variables and latent constructs). qgraph is a package that can be used to plot several types of graphs. It is mainly aimed at visualizing relationships in (psychometric) data as networks to create a clear picture of what the data actually looks like. SIMSEM is an R package developed for facilitating simulation and analysis of data within the structural equation modeling (SEM) framework.

Preview
02:53

Solution to the EFA exercises are provided in R scripts.

Exploratory Factor Analysis (EFA) Exercise Solution

09:15

Structural equation modeling (SEM) is a methodology for representing, estimating, and testing a network of relationships between variables (measured variables and latent constructs). Specification is formulating a statement about a set of parameters and stating a model. A critical principle in model specification and evaluation is the fact that all of the models that we would be interested in specifying and evaluating are wrong to some degree We must define as an optimal outcome a finding that a particular model fits our observed data closely and yields a highly interpretable solution. Instead of considering all possible models, a finding that a particular model fits observed data well and yields an interpretable solution can be taken to mean only that the model provides one plausible representation of the structure that produced the observed data.

Specify and Estimate Drug Use SEM Model

11:56

Structural equation modeling (SEM) is a methodology for representing, estimating, and testing a network of relationships between variables (measured variables and latent constructs). Specification is formulating a statement about a set of parameters and stating a model. A critical principle in model specification and evaluation is the fact that all of the models that we would be interested in specifying and evaluating are wrong to some degree We must define as an optimal outcome a finding that a particular model fits our observed data closely and yields a highly interpretable solution. Instead of considering all possible models, a finding that a particular model fits observed data well and yields an interpretable solution can be taken to mean only that the model provides one plausible representation of the structure that produced the observed data.

Specify and Estimate Alienation SEM Model

05:15

qgraph is a package that can be used to plot several types of graphs. It is mainly aimed at visualizing relationships in (psychometric) data as networks to create a clear picture of what the data actually looks like.

Its most important use is to visualize correlation matrices as a network in which each node represents a variable and each edge a correlation. The color of the edges indicate the sign of the correlation (green for positive correlations and red for negative correlations) and the width indicate the strength of the correlation. Other statistics can also be used in the graph as long as negative and positive values are comparable in strength and zero indicates no relationship.

qgraph also comes with various functions to visualize other statistics and even perform analyses, such as EFA, PCA, CFA and SEM. The stable release of qgraph is available at CRAN, the developmental version of qgraph is available at GitHub and finally an article introducing the package in detail is available in the Journal of Statistical Software.

Since qgraph 1.3 the package also contains network model selection and estimation procedures.

QGraph Visualizations

06:43

The SIMSEM R package has been developed for facilitating simulation and analysis of data within the structural equation modeling (SEM) framework. This package aims to help analysts create simulated data from hypotheses or analytic results from obtained data. The simulated data can be used for different purposes, such as power analysis, model fit evaluation, and planned missing design. Students will have an appreciation of how to use SIMSEM for these purposes.

SIMSEM Package Simulation Capabilities (part 1)

09:57

The SIMSEM R package has been developed for facilitating simulation and analysis of data within the structural equation modeling (SEM) framework. This package aims to help analysts create simulated data from hypotheses or analytic results from obtained data. The simulated data can be used for different purposes, such as power analysis, model fit evaluation, and planned missing design. Students will have an appreciation of how to use SIMSEM for these purposes.

SIMSEM Package Simulation Capabilities (part 2)

04:12

About the Instructor