Find online courses made by experts from around the world.
Take your courses with you and learn anywhere, anytime.
Learn and practice realworld skills and achieve your goals.
Applied Multivariate Analysis (MVA) with R is a practical, conceptual and applied "handson" course that teaches students how to perform various specific MVA tasks using real data sets and R software. It is an excellent and practical background course for anyone engaged with educational or professional tasks and responsibilities in the fields of data mining or predictive analytics, statistical or quantitative modeling (including linear, GLM and/or nonlinear modeling, covariancebased Structural Equation Modeling (SEM) specification and estimation, and/or variancebased PLS Path Model specification and estimation. Students learn all about the nature of multivariate data and multivariate analysis. Students specifically learn how to create and estimate: covariance and correlation matrices; Principal Components Analyses (PCA); Multidimensional Scaling (MDS); Cluster Analysis; Exploratory Factor Analyses (EFA); and SEM model estimation. The course also teaches how to create dozens of different dazzling 2D and 3D multivariate data visualizations using R software. All software, R scripts, datasets and slides used in all lectures are provided in the course materials. The course is structured as a series of seven sections, each addressing a specific MVA topic and each section culminating with one or more "handson" exercises for the students to complete before proceeding to reinforce learning the presented MVA concepts and skills. The course is an excellent vehicle to acquire "realworld" predictive analytics skills that are in high demand today in the workplace. The course is also a fertile source of relevant skills and knowledge for graduate students and faculty who are required to analyze and interpret research data.
Not for you? No problem.
30 day money back guarantee.
Forever yours.
Lifetime access.
Learn on the go.
Desktop, iOS and Android.
Get rewarded.
Certificate of completion.
Section 1: Introduction to Multivariate Data and Analysis  

Lecture 1  11:40  
This video presents an overview of the Applied Multivariate Analysis (MVA) course. 

Lecture 2  02:25  
The materials used in the video lectures for Section 1 Introduction to Multivariate Data and Analysis are briefly explained and then provided as a .zip file download after the short video is presented. 

Lecture 3  14:15  
Multivariate analysis (MVA) is based on the statistical principle of multivariate statistics, which involves the observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest. Some of the applications include: • To reduce a large number of variables to a smaller number of factors for data modeling • To validate a scale or index by demonstrating that its constituent items load on the same factor, and to drop proposed scale items which crossload on more than one factor. • To select a subset of variables from a larger set, based on which original variables have the highest correlations with some other factors. • To create a set of factors to be treated as uncorrelated variables as one approach to handling multicollinearity in such procedures as multiple regression In this "handson" course on applied multivariate analysis, we focus on how to actually use and conduct MVA analyses, using dozens of real data sets and R software. We examine the techniques and examples of principal components analysis, multidimensional scaling, cluster analysis, exploratory factor analysis, and an introduction to structural equation modeling. 

Lecture 4  08:20  
Missing data is a huge problem in analyzing data sets because many statistical and mathematical functions fail when any individual data observations have even one missing data element. We explain and demonstrate why this is a problem using a 'body measures' dataset that we construct in R, and we show some "quick fixes" to getting around this problem of missing data in multivariate analysis. 

Lecture 5  10:11  
We create several multivariate data sets using R software. We use these data sets and others in the rest of the course. 

Lecture 6  11:36  
In probability theory and statistics, a covariance matrix (also known as dispersion matrix or variance–covariance matrix) is a matrix whose element in the i, j position is the covariance between the i ^{th} and j ^{th} elements of a random vector (that is, of a vector of random variables). Each element of the vector is a scalar random variable, either with a finite number of observed empirical values or with a finite or infinite number of potential values specified by a theoretical joint probability distribution of all the random variables. The correlation matrix of n random variables X_{1}, ..., X_{n} is the n × n matrix whose i,j entry is corr(X_{i}, X_{j}). If the measures of correlation used are productmoment coefficients, the correlation matrix is the same as the covariance matrix of the standardized random variables X_{i} / σ (X_{i}) for i = 1, ..., n. This applies to both the matrix of population correlations (in which case "σ" is the population standard deviation), and to the matrix of sample correlations (in which case "σ" denotes the sample standard deviation). Consequently, each is necessarily a positivesemidefinite matrix. The correlation matrix is symmetric because the correlation between X_{i} and X_{j} is the same as the correlation between X_{j} and X_{i}. 

Lecture 7  10:21  
We continue our discussion of creating, estimating and using both covariance and correlation matrices in multivariate analysis using R software. We also introduce the concept of "distance" for finding similarities / differences among sets of variables. 

Lecture 8  10:12  
We continue our discussion of creating, estimating and using both covariance and correlation matrices in multivariate analysis using R software. We also introduce the concept of "distance" for finding similarities / differences among sets of variables. 

Lecture 9  11:28  
We describe, create (with simulation), demonstrate and visualize a multivariate normal (MVN) density function using R. In probability theory and statistics, the multivariate normal distribution or multivariate Gaussian distribution, is a generalization of the onedimensional (univariate) normal distribution to higher dimensions. One possible definition is that a random vector is said to be kvariate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated realvalued random variables each of which clusters around a mean value. 

Lecture 10  10:03  
We demonstrate several R software graphical approaches to test for univariate and multivariate normality. 

Lecture 11  13:41  
We continue our illustrative cases and examples of creating normality plots in R software. 

Lecture 12  07:09  
This video lecture explains the three covariance, correlation and normality exercises for the first section of the applied MVA course. 

Section 2: Visualizing Multivariate Data  
Lecture 13  02:46  
The materials used in the video lectures for Section 2 Visualizing Multivariate Data are briefly explained and then provided as a .zip file download after the short video is presented. 

Lecture 14 
Covariance and Correlation Matrices with Missing Data (part 1)
Preview

08:49  
Lecture 15 
Covariance and Correlation Matrices with Missing Data (part 2)

10:45  
Lecture 16 
Univariate and Multivariate QQPlots of Pottery Data

09:38  
Lecture 17 
Converting Covariance to Correlation Matrices

15:20  
Lecture 18 
Plots for Marginal Distributions

15:35  
Lecture 19 
Outlier Identification

16:29  
Lecture 20 
Chi, Bubble, and other Glyph Plots

14:12  
Lecture 21 
Scatterplot Matrix

07:56  
Lecture 22 
Kernel Density Estimators

13:41  
Lecture 23 
3Dimensional and Trellis (Lattice Package) Graphics

15:26  
Lecture 24 
More Trellis (Lattice Package) Graphics

14:04  
Lecture 25 
Bivariate Boxplot and ChiPlot Visualizations Exercises

02:16  
Section 3: Principal Components Analysis (PCA)  
Lecture 26  00:44  
The materials used in the video lectures for Section 3 Principal Components Analysis (PCS) are briefly explained and then provided as a .zip file download after the short video is presented. 

Lecture 27 
Bivariate Boxplot Visualization Exercise Solution

14:48  
Lecture 28 
ChiPlot Visualization Exercise Solution

03:40  
Lecture 29 
What is a "Principal Components Analysis" (PCA) ?
Preview

11:13  
Lecture 30 
PCA Basics with R: Blood Data (part 1)

09:18  
Lecture 31 
PCA Basics with R: Blood Data (part 2)

10:51  
Lecture 32 
PCA with Head Size Data (part 1)

08:01  
Lecture 33 
PCA with Head Size Data (part 2)

09:31  
Lecture 34 
PCA with Heptathlon Data (part 1)

07:40  
Lecture 35 
PCA with Heptathlon Data (part 2)

10:03  
Lecture 36 
PCA with Heptathlon Data (part 3)

13:04  
Lecture 37 
PCA Criminal Convictions Exercise

01:21  
Section 4: Multidimensional Scaling (MDS)  
Lecture 38  00:56  
The materials used in the video lectures for Section 4 Multidimensional Scaling (MDS) are briefly explained and then provided as a .zip file download after the short video is presented. 

Lecture 39 
PCA Criminal Convictions Exercise Solution

14:29  
Lecture 40 
Introduction to Multidimensional Scaling
Preview

13:31  
Lecture 41 
Classical Multidimensional Scaling (part 1)

14:50  
Lecture 42 
Classical Multidimensional Scaling (part 2)

08:47  
Lecture 43 
Classical Multidimensional Scaling: Skulls Data

17:46  
Lecture 44 
NonMetric Multidimensional Scaling Example: Voting Behavior

14:24  
Lecture 45 
NonMetric Multidimensional Scaling Example: WW II Leaders

09:08  
Lecture 46 
Multidimensional Scaling Exercise: Water Voles

02:48  
Section 5: Cluster Analysis  
Lecture 47  01:13  
The materials used in the video lectures for Section 5 Cluster Analysis are briefly explained and then provided as a .zip file download after the short video is presented. 

Lecture 48 
MDS Water Voles Exercise Solution

13:55  
Lecture 49 
Introduction to Cluster Analysis
Preview

10:50  
Lecture 50 
Hierarchical Clustering Distance Techniques

10:35  
Lecture 51 
Hierarchical Clustering of Measures Data

12:40  
Lecture 52 
Hierarchical Clustering of Fighter Jets

10:15  
Lecture 53 
KMeans Clustering of Crime Data (part 1)

13:06  
Lecture 54 
KMeans Clustering of Crime Data (part 2)

06:36  
Lecture 55 
Clustering of RomanoBritish Pottery Data

14:50  
Lecture 56 
KMeans Classifying of Exoplanets

13:20  
Lecture 57 
ModelBased Clustering of Exoplanets

12:34  
Lecture 58 
Finite Mixture ModelBased Analysis

13:04  
Lecture 59 
Cluster Analysis Neighborhood and Stripes Plots

10:07  
Lecture 60 
KMeans Cluster Analysis Crime Data Exercise

00:35  
Section 6: Exploratory Factor Analysis (EFA)  
Lecture 61  00:40  
The materials used in the video lectures for Section 6 Exploratory Factor Analysis are briefly explained and then provided as a .zip file download after the short video is presented. 

Lecture 62  09:20  
The solution to the KMeans exercise using the crime data is explained. 

Lecture 63  14:38  
In multivariate statistics, exploratory factor analysis (EFA) is a statistical method used to uncover the underlying structure of a relatively large set of variables. EFA is a technique within factor analysis whose overarching goal is to identify the underlying relationships between measured variables. It is commonly used by researchers when developing a scale (a scale is a collection of questions used to measure a particular research topic) and serves to identify a set of latent constructs underlying a battery of measured variables. It should be used when the researcher has no a priori hypothesis about factors or patterns of measured variables. Measured variables are any one of several attributes of people that may be observed and measured. An example of a measured variable would be the physical height of a human being. Researchers must carefully consider the number of measured variables to include in the analysis. EFA procedures are more accurate when each factor is represented by multiple measured variables in the analysis. 

Lecture 64  07:34  
The factanal() function in R performs maximumlikelihood factor analysis on a covariance matrix or data matrix. 

Lecture 65  14:47  
Is an example of estimating an EFA using R software with the life data provided in the materials. 

Lecture 66  16:20  
Is an example of estimating an EFA using R software with the drug use data provided in the materials. 

Lecture 67  08:17  
Both exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) are employed to understand shared variance of measured variables that is believed to be attributable to a factor or latent construct. Despite this similarity, however, EFA and CFA are conceptually and statistically distinct analyses. The goal of EFA is to identify factors based on data and to maximize the amount of variance explained. The researcher is not required to have any specific hypotheses about how many factors will emerge, and what items or variables these factors will comprise. If these hypotheses exist, they are not incorporated into and do not affect the results of the statistical analyses. By contrast, CFA evaluates a priori hypotheses and is largely driven by theory. CFA analyses require the researcher to hypothesize, in advance, the number of factors, whether or not these factors are correlated, and which items/measures load onto and reflect which factors. As such, in contrast to exploratory factor analysis, where all loadings are free to vary, CFA allows for the explicit constraint of certain loadings to be zero. EFA is sometimes reported in research when CFA would be a better statistical approach. It has been argued that CFA can be restrictive and inappropriate when used in an exploratory fashion. However, the idea that CFA is solely a “confirmatory” analysis may sometimes be misleading, as modification indices used in CFA are somewhat exploratory in nature. Modification indices show the improvement in model fit if a particular coefficient were to become unconstrained. Likewise, EFA and CFA do not have to be mutually exclusive analyses; EFA has been argued to be a reasonable follow up to a poorfitting CFA model.^{} 

Lecture 68  02:43  
The correlation matrix given below represent grading scores of 220 boys in six school subjects: (1) French; (2) English; (3) History; (4) Arithmetic; (5) Algebra and (6) Geometry. Find the twofactor solution from a maximum likelihood factor analysis. Interpret the factor loadings. Then plot these derived loadings and interpret again. Was it easier to interpret the factors by looking at the visualization? Finally, find an nonorthogonal rotation that allows easier interpretation of the results looking at the factor loadings directly, without the "visual utility" that is afforded by plotting the twofactor solution first. # French 1.00 # English 0.44 1.00 # History 0.41 0.35 1.00 # Arithmetic 0.29 0.35 0.16 1.00 # Algebra 0.33 0.32 0.19 0.59 1.00 # Geometry 0.25 0.33 0.18 0.47 0.46 1.00 

Section 7: Introduction to Structural Equation Modeling (SEM), QGraph, and SIMSEM  
Lecture 69  02:53  
Structural equation modeling (SEM) is a methodology for representing, estimating, and testing a network of relationships between variables (measured variables and latent constructs). qgraph is a package that can be used to plot several types of graphs. It is mainly aimed at visualizing relationships in (psychometric) data as networks to create a clear picture of what the data actually looks like. SIMSEM is an R package developed for facilitating simulation and analysis of data within the structural equation modeling (SEM) framework. 

Lecture 70  09:15  
Solution to the EFA exercises are provided in R scripts. 

Lecture 71  11:56  
Structural equation modeling (SEM) is a methodology for representing, estimating, and testing a network of relationships between variables (measured variables and latent constructs). Specification is formulating a statement about a set of parameters and stating a model. A critical principle in model specification and evaluation is the fact that all of the models that we would be interested in specifying and evaluating are wrong to some degree We must define as an optimal outcome a finding that a particular model fits our observed data closely and yields a highly interpretable solution. Instead of considering all possible models, a finding that a particular model fits observed data well and yields an interpretable solution can be taken to mean only that the model provides one plausible representation of the structure that produced the observed data. 

Lecture 72  05:15  
Structural equation modeling (SEM) is a methodology for representing, estimating, and testing a network of relationships between variables (measured variables and latent constructs). Specification is formulating a statement about a set of parameters and stating a model. A critical principle in model specification and evaluation is the fact that all of the models that we would be interested in specifying and evaluating are wrong to some degree We must define as an optimal outcome a finding that a particular model fits our observed data closely and yields a highly interpretable solution. Instead of considering all possible models, a finding that a particular model fits observed data well and yields an interpretable solution can be taken to mean only that the model provides one plausible representation of the structure that produced the observed data. 

Lecture 73  06:43  
qgraph is a package that can be used to plot several types of graphs. It is mainly aimed at visualizing relationships in (psychometric) data as networks to create a clear picture of what the data actually looks like. Its most important use is to visualize correlation matrices as a network in which each node represents a variable and each edge a correlation. The color of the edges indicate the sign of the correlation (green for positive correlations and red for negative correlations) and the width indicate the strength of the correlation. Other statistics can also be used in the graph as long as negative and positive values are comparable in strength and zero indicates no relationship. qgraph also comes with various functions to visualize other statistics and even perform analyses, such as EFA, PCA, CFA and SEM. The stable release of qgraph is available at CRAN, the developmental version of qgraph is available at GitHub and finally an article introducing the package in detail is available in the Journal of Statistical Software. Since qgraph 1.3 the package also contains network model selection and estimation procedures. 

Lecture 74  09:57  
The SIMSEM R package has been developed for facilitating simulation and analysis of data within the structural equation modeling (SEM) framework. This package aims to help analysts create simulated data from hypotheses or analytic results from obtained data. The simulated data can be used for different purposes, such as power analysis, model fit evaluation, and planned missing design. Students will have an appreciation of how to use SIMSEM for these purposes. 

Lecture 75  04:12  
The SIMSEM R package has been developed for facilitating simulation and analysis of data within the structural equation modeling (SEM) framework. This package aims to help analysts create simulated data from hypotheses or analytic results from obtained data. The simulated data can be used for different purposes, such as power analysis, model fit evaluation, and planned missing design. Students will have an appreciation of how to use SIMSEM for these purposes. 
Dr. Geoffrey Hubona held fulltime tenuretrack, and tenured, assistant and associate professor faculty positions at 3 major state universities in the Eastern United States from 19932010. In these positions, he taught dozens of various statistics, business information systems, and computer science courses to undergraduate, master's and Ph.D. students. He earned a Ph.D. in Business Administration (Information Systems and Computer Science) from the University of South Florida (USF) in Tampa, FL (1993); an MA in Economics (1990), also from USF; an MBA in Finance (1979) from George Mason University in Fairfax, VA; and a BA in Psychology (1972) from the University of Virginia in Charlottesville, VA. He was a fulltime assistant professor at the University of Maryland Baltimore County (19931996) in Catonsville, MD; a tenured associate professor in the department of Information Systems in the Business College at Virginia Commonwealth University (19962001) in Richmond, VA; and an associate professor in the CIS department of the Robinson College of Business at Georgia State University (20012010). He is the founder of the Georgia R School (20102014) and of RCourseware (2014Present), online educational organizations that teach research methods and quantitative analysis techniques. These research methods techniques include linear and nonlinear modeling, multivariate methods, data mining, programming and simulation, and structural equation modeling and partial least squares (PLS) path modeling. Dr. Hubona is an expert of the analytical, opensource R software suite and of various PLS path modeling software packages, including SmartPLS. He has published dozens of research articles that explain and use these techniques for the analysis of data, and, with software codevelopment partner Dean Lim, has created a popular cloudbased PLS software application, PLSGUI.