Find online courses made by experts from around the world.
Take your courses with you and learn anywhere, anytime.
Learn and practice realworld skills and achieve your goals.
This "handson" course teaches one how to use the R software lavaan package to specify, estimate the parameters of, and interpret covariancebased structural equation (SEM) models that use latent variables. "lavaan" (note the purposeful use of lowercase "L" in 'lavaan') is an acronym for latent variable analysis, and the name suggests the longterm goal of the developer, Yves Rosseel: "to provide a collection of tools that can be used to explore, estimate, and understand a wide family of latent variable models, including factor analysis, structural equation, longitudinal, multilevel, latent class, item response, and missing data models." The course uses and executes many "live" examples (with included R scripts and datasets) using nocost R and RStudio software to demonstrate and teach how to: (1) specify a SEM model in lavaan syntax; (2) fit and then evaluate your model; (3) perform a CFA; (4) impute and replace missing data; (5) estimate mediating and other indirect effects; (6) estimate and evaluate multigroup models, simultaneously establishing measurement invariance; and (7) specifying and estimating latent (growth) curve models, including the use of random (and latent) intercepts and slopes. The R lavaan package is worldclass 'professionalgrade' SEM software, used by thousands of SEM experts, graduate students, and college and university faculty around the world.
Not for you? No problem.
30 day money back guarantee.
Forever yours.
Lifetime access.
Learn on the go.
Desktop, iOS and Android.
Get rewarded.
Certificate of completion.
Section 1: Introduction to R and to SEM using the lavaan package  

Lecture 1  07:04  
R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, surveys of data miners, and studies of scholarly literature databases show that R's popularity has increased substantially in recent years^{.} R is an implementation of the S programming language combined with lexical scoping semantics inspired by Scheme. S was created by John Chambers while at Bell Labs. There are some important differences, but much of the code written for S runs unaltered. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors and partly as a play on the name of S. R is a GNU project. The source code for the R software environment is written primarily in C, Fortran, and R. R is freely available under the GNU General Public License, and precompiled binary versions are provided for various operating systems. R uses a command line interface; there are also several graphical frontends for it. 

Lecture 2  09:31  
Structural equation modeling (SEM) is a family of statistical methods designed to test a conceptual or theoretical model.^{[} Some common SEM methods include confirmatory factor analysis, path analysis, and latent growth modeling.^{[} The term "structural equation model" most commonly refers to a combination of two things: a "measurement model" that defines latent variables using one or more observed variables, and a "structural regression model" that links latent variables together. The parts of a structural equation model are linked to one another using a system of simultaneous regression equations. SEM is widely used in the social sciences because of its ability to isolate observational error from measurement of latent variables. To provide a simple example, the concept of human intelligence cannot be measured directly as one could measure height or weight. Instead, psychologists develop theories of intelligence and write measurement instruments with items (questions) designed to measure intelligence according to their theory.^{ }They would then use SEM to test their theory using data gathered from people who took their intelligence test. With SEM, "intelligence" would be the latent variable and the test items would be the observed variables. 

Lecture 3  09:30  
In statistics, path analysis is used to describe the directed dependencies among a set of variables. This includes models equivalent to any form of multiple regression analysis, factor analysis, canonical correlation analysis, discriminant analysis, as well as more general families of models in the multivariate analysis of variance and covariance analyses (MANOVA, ANOVA, ANCOVA). In addition to being thought of as a form of multiple regression focusing on causality, path analysis can be viewed as a special case of structural equation modeling (SEM) – one in which only single indicators are employed for each of the variables in the causal model. That is, path analysis is SEM with a structural model, but no measurement model. Other terms used to refer to path analysis include causal modeling, analysis of covariance structures, and latent variable models. 

Lecture 4 
Input and Output into R
Preview

11:08  
Lecture 5 
Useful Data Summary Statistics

13:28  
Lecture 6 
JSS Reading and Exercise #1

02:27  
Lecture 7 
What is lavaan (up to syntax) ?
Preview

14:09  
Lecture 8  08:43  
In statistics, confirmatory factor analysis (CFA) is a special form of factor analysis, most commonly used in social research.^{[} It is used to test whether measures of a construct are consistent with a researcher's understanding of the nature of that construct (or factor). As such, the objective of confirmatory factor analysis is to test whether the data fit a hypothesized measurement model. This hypothesized model is based on theory and/or previous analytic research. CFA was first developed by Jöreskog and has built upon and replaced older methods of analyzing construct validity such as the MTMM Matrix as described in Campbell & Fiske (1959). In confirmatory factor analysis, the researcher first develops a hypothesis about what factors s/he believes are underlying the measures s/he has used (e.g., "Depression" being the factor underlying the Beck Depression Inventory and the Hamilton Rating Scale for Depression) and may impose constraints on the model based on these a priori hypotheses. By imposing these constraints, the researcher is forcing the model to be consistent with his/her theory. For example, if it is posited that there are two factors accounting for the covariance in the measures, and that these factors are unrelated to one another, the researcher can create a model where the correlation between factor A and factor B is constrained to zero. Model fit measures could then be obtained to assess how well the proposed model captured the covariance between all the items or measures in the model. If the constraints the researcher has imposed on the model are inconsistent with the sample data, then the results of statistical tests of model fit will indicate a poor fit, and the model will be rejected. If the fit is poor, it may be due to some items measuring multiple factors. It might also be that some items within a factor are more related to each other than others. 

Lecture 9 
Other Useful lavaan Fitted Results Functions

06:38  
Section 2: Confirmatory Factor Analysis (CFA) with lavaan  
Lecture 10 
Exercise Solutions from Section 1

07:20  
Lecture 11 
SEM Review (slides, part 1)
Preview

10:06  
Lecture 12 
SEM Review (slides, part 2)

10:14  
Lecture 13 
SEM Review (slides, part 3)

11:39  
Lecture 14 
Run CFA in R Script (part 1)
Preview

12:50  
Lecture 15 
Run CFA in R Script (part 2)

14:19  
Lecture 16 
Run CFA in R Script (part 3)

09:41  
Lecture 17 
Run CFA in R Script (part 4)

12:09  
Lecture 18 
CFA Exercise

00:31  
Section 3: Full SEM Models  
Lecture 19 
Solution to CFA Exercise from Section 2 (part 1)

16:05  
Lecture 20 
Solution to CFA Exercise from Section 2 (part 2)

12:00  
Lecture 21 
Full SEM Political Democracy Model Example (part 1)
Preview

10:40  
Lecture 22 
Full SEM Political Democracy Model Example (part 2)

09:40  
Lecture 23 
Full SEM Political Democracy Model Example (part 3)

14:45  
Lecture 24 
Full SEM Quantitative Attitudes Example (part 1)

11:17  
Lecture 25 
Full SEM Quantitative Attitudes Example (part 2)

05:55  
Lecture 26 
Setting Inequalities Full SEM Exercise

01:20  
Section 4: Factor Analysis  
Lecture 27  09:40  
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in four observed variables mainly reflect the variations in two unobserved variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modelled as linear combinations of the potential factors, plus "error" terms. The information gained about the interdependencies between observed variables can be used later to reduce the set of variables in a dataset. Computationally this technique is equivalent to lowrank approximation of the matrix of observed variables. Factor analysis originated in psychometrics and is used in behavioral sciences, social sciences, marketing, product management, operations research, and other fields that deal with data sets where there are large numbers of observed variables that are are thought to reflect a small number of latent variables. Factor analysis is related to principal component analysis (PCA), but the two are not identical. Latent variable models, including factor analysis, use regression modelling techniques to test hypotheses producing error terms, while PCA is a descriptive statistical technique. There has been significant controversy in the field over the equivalence or otherwise of the two techniques (see exploratory factor analysis versus principal components analysis).^{[} 

Lecture 28 
Set Up Data for Factor Analysis
Preview

10:17  
Lecture 29  08:56  
In multivariate statistics, exploratory factor analysis (EFA) is a statistical method used to uncover the underlying structure of a relatively large set of variables. EFA is a technique within factor analysis whose overarching goal is to identify the underlying relationships between measured variables. It is commonly used by researchers when developing a scale (a scale is a collection of questions used to measure a particular research topic) and serves to identify a set of latent constructs underlying a battery of measured variables. It should be used when the researcher has no a priori hypothesis about factors or patterns of measured variables. Measured variables are any one of several attributes of people that may be observed and measured. An example of a measured variable would be the physical height of a human being. Researchers must carefully consider the number of measured variables to include in the analysis. EFA procedures are more accurate when each factor is represented by multiple measured variables in the analysis. EFA is based on the common factor model. Within the common factor model, a function of common factors, unique factors, and errors of measurements expresses measured variables. Common factors inﬂuence two or more measured variables, while each unique factor inﬂuences only one measured variable and does not explain correlations among measured variables. EFA assumes that any indicator/measured variable may be associated with any factor. When developing a scale, researchers should use EFA first before moving on to confirmatory factor analysis (CFA). EFA requires the researcher to make a number of important decisions about how to conduct the analysis because there is no one set method. 

Lecture 30 
Continue Performing Various EFAs

09:15  
Lecture 31 
Perform a Confirmatory Factor Analysis (CFA)
Preview

08:27  
Section 5: Missing Data and Imputation  
Lecture 32  01:41  
In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as "unit imputation"; when substituting for a component of a data point, it is known as "item imputation". Because missing data can create problems for analyzing data, imputation is seen as a way to avoid pitfalls involved with listwise deletion of cases that have missing values. That is to say, when one or more values are missing for a case, most statistical packages default to discarding any case that has a missing value, which may introduce bias or affect the representativeness of the results. Imputation preserves all cases by replacing missing data with a probable value based on other available information. Once all missing values have been imputed, the data set can then be analysed using standard techniques for complete data. 

Lecture 33 
Solution to Setting Inequalities Exercise from Section 3 (part 1)
Preview

10:46  
Lecture 34 
Solution to Setting Inequalities Exercise from Section 3 (part 2)

07:29  
Lecture 35 
Solution to Setting Inequalities Exercise from Section 3 (part 3)

05:36  
Lecture 36 
Missing Data Problems and Issues (part 1)

12:42  
Lecture 37 
Missing Data Problems and Issues (part 2)

08:41  
Lecture 38 
Missing Data Imputation R Scripts Examples (part 1)

08:35  
Lecture 39 
Missing Data Imputation R Scripts Examples (part 2)

11:20  
Lecture 40 
More R Scripts with Modification Indices and FIML

10:01  
Lecture 41 
Missing Data Exercise and an Audience Question

02:30  
Section 6: Mediation and Indirect Effects  
Lecture 42  04:38  
In statistics, a mediation model is one that seeks to identify and explicate the mechanism or process that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of a third explanatory variable, known as a mediator variable. Rather than hypothesizing a direct causal relationship between the independent variable and the dependent variable, a mediational model hypothesizes that the independent variable influences the mediator variable, which in turn influences the dependent variable. Thus, the mediator variable serves to clarify the nature of the relationship between the independent and dependent variables. In other words, mediating relationships occur when a third variable plays an important role in governing the relationship between the other two variables. 

Lecture 43 
Solution to Missing Data Exercise from Section 5 (part 1)

08:30  
Lecture 44 
Solution to Missing Data Exercise from Section 5 (part 2)

08:00  
Lecture 45 
Mediation Concepts (slides, part 1)

11:02  
Lecture 46 
Mediation Concepts (slides, part 2)

06:08  
Lecture 47 
First R Script Mediation Example
Preview

09:24  
Lecture 48 
Second R Script Mediation Example (part 1)
Preview

10:22  
Lecture 49 
Second R Script Mediation Example (part 2)

12:50  
Lecture 50 
Mediation Exercise

00:32  
Lecture 51 
More on the Complexity of Mediation

06:24  
Section 7: Estimating Group Effects  
Lecture 52 
Introduction to Estimating Group Effects and Moderation

00:47  
Lecture 53 
Solution to Mediation Exercise from Section 6

12:29  
Lecture 54 
Introduction to Meanstructures; Group Effects Slides
Preview

10:53  
Lecture 55 
Group Analysis Functions in lavaan

06:54  
Lecture 56 
Groups R Script CFA Example
Preview

08:49  
Lecture 57 
Constraining Parameters Across Groups

09:25  
Lecture 58  09:08  
Measurement invariance or measurement equivalence is a statistical property of measurement that indicates that the same construct is being measured across some specified groups. For example, measurement invariance can be used to study whether a given measure is interpreted in a conceptually similar manner by respondents representing different genders or cultural backgrounds. Violations of measurement invariance may preclude meaningful interpretation of measurement data. Tests of measurement invariance are increasingly used in fields such as psychology to supplement evaluation of measurement quality rooted in classical test theory. Measurement invariance is often tested in the framework of multiplegroup confirmatory factor analysis (CFA). 

Lecture 59 
MultiGroup Analysis Measurement Invariance Example (part 2)

09:57  
Section 8: Latent (Growth) Curve Models  
Lecture 60  10:06  
Latent growth modeling is a statistical technique used in the structural equation modeling (SEM) framework to estimate growth trajectory. It is a longitudinal analysis technique to estimate growth over a period of time. It is widely used in the field of behavioral science, education and social science. It is also called latent growth curve analysis. The latent growth model was derived from theories of SEM. General purpose SEM software, such as OpenMx, lavaan (both open source packages based in R), AMOS, Mplus, LISREL, or EQS among others may be used to estimate the trajectory of growth. Latent Growth Models represent repeated measures of dependent variables as a function of time and other measures. Such longitudinal data share the features that the same subjects are observed repeatedly over time, and on the same tests (or parallel versions), and at known times. In latent growth modeling, the relative standing of an individual at each time is modeled as a function of an underlying growth process, with the best parameter values for that growth process being fitted to each individual. These models have grown in use in social and behavioral research since it was shown that they can be fitted as a restricted common factor model in the structural equation modeling framework. 

Lecture 61 
First LCM Example (part 1)

11:16  
Lecture 62 
Add Covariates to First LCM Example

09:33  
Lecture 63 
Crime Data LCM Example (part 1)

11:45  
Lecture 64 
Crime Data LCM Example (part 2)

12:20  
Lecture 65 
Crime Data LCM Example (part 3)

11:06  
Lecture 66 
Crime Data LCM Example (part 4)

07:47  
Lecture 67 
Latent Curve Models Review
Preview

10:11  
Lecture 68 
Review LCM Crime Models 1 and 2

11:56  
Lecture 69 
Review LCM Crime Models 3, 4 and 5
Preview

08:46  
Lecture 70 
Adding Covariates and MIMIC

12:05  
Lecture 71 
More on Covariates and Interactions
Preview

11:23  
Lecture 72 
Alternative Specifications of Latent Intercepts and Slopes
Preview

10:01  
Lecture 73 
Alternative Specification of Multigroup Models (exercise solution)

12:37 
Dr. Geoffrey Hubona held fulltime tenuretrack, and tenured, assistant and associate professor faculty positions at 3 major state universities in the Eastern United States from 19932010. In these positions, he taught dozens of various statistics, business information systems, and computer science courses to undergraduate, master's and Ph.D. students. He earned a Ph.D. in Business Administration (Information Systems and Computer Science) from the University of South Florida (USF) in Tampa, FL (1993); an MA in Economics (1990), also from USF; an MBA in Finance (1979) from George Mason University in Fairfax, VA; and a BA in Psychology (1972) from the University of Virginia in Charlottesville, VA. He was a fulltime assistant professor at the University of Maryland Baltimore County (19931996) in Catonsville, MD; a tenured associate professor in the department of Information Systems in the Business College at Virginia Commonwealth University (19962001) in Richmond, VA; and an associate professor in the CIS department of the Robinson College of Business at Georgia State University (20012010). He is the founder of the Georgia R School (20102014) and of RCourseware (2014Present), online educational organizations that teach research methods and quantitative analysis techniques. These research methods techniques include linear and nonlinear modeling, multivariate methods, data mining, programming and simulation, and structural equation modeling and partial least squares (PLS) path modeling. Dr. Hubona is an expert of the analytical, opensource R software suite and of various PLS path modeling software packages, including SmartPLS. He has published dozens of research articles that explain and use these techniques for the analysis of data, and, with software codevelopment partner Dean Lim, has created a popular cloudbased PLS software application, PLSGUI.