Structural equation modeling (SEM) with lavaan

Name: Structural equation modeling (SEM) with lavaan
Rating: 4.2 (405 reviews)

Learn how to specify, estimate and interpret SEM models with no-cost professional R software used by experts worldwide.

Created byGeoffrey Hubona, Ph.D.

Last updated 5/2021

English

What you'll learn

Specify and estimate parameters in a structural equation model using the R lavaan package and interpret and report on the SEM model results.
Perform exploratory and confirmatory factors analyses (EFAs and CFAs) using their own datasets.
Use a variety of multiple imputation techniques to "fill in," and correct for, missing data.
Specify and estimate mediated and other indirect SEM effects using traditional parametric confidence intervals, as well as using bootstrapped and/or bias-corrected and accelerated non-parametric approaches.
Specify and estimate the fit of multi-group SEM models, as well as determine levels of measurement invariance (metric, scalar, configural).
Output beautiful multi-color plots of fitted SEM models for use in reports and publications.
Understand how to set-up, specify, estimate and interpret a latent (growth) curve model, using alternate random intercept and slope model specifications.

Course content

8 sections • 73 lectures • 11h 16m total length

Introduction to Course and to R (slides)7:04
R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, surveys of data miners, and studies of scholarly literature databases show that R's popularity has increased substantially in recent years^.

R is an implementation of the S programming language combined with lexical scoping semantics inspired by Scheme. S was created by John Chambers while at Bell Labs. There are some important differences, but much of the code written for S runs unaltered.

R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors and partly as a play on the name of S.

R is a GNU project. The source code for the R software environment is written primarily in C, Fortran, and R. R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems. R uses a command line interface; there are also several graphical front-ends for it.
Introduction to Path Modeling and SEM (slides, part 1)9:31
Structural equation modeling (SEM) is a family of statistical methods designed to test a conceptual or theoretical model.^[ Some common SEM methods include confirmatory factor analysis, path analysis, and latent growth modeling.^[ The term "structural equation model" most commonly refers to a combination of two things: a "measurement model" that defines latent variables using one or more observed variables, and a "structural regression model" that links latent variables together. The parts of a structural equation model are linked to one another using a system of simultaneous regression equations.

SEM is widely used in the social sciences because of its ability to isolate observational error from measurement of latent variables. To provide a simple example, the concept of human intelligence cannot be measured directly as one could measure height or weight. Instead, psychologists develop theories of intelligence and write measurement instruments with items (questions) designed to measure intelligence according to their theory.They would then use SEM to test their theory using data gathered from people who took their intelligence test. With SEM, "intelligence" would be the latent variable and the test items would be the observed variables.
Introduction to Path Modeling and SEM (slides, part 2)9:30
In statistics, path analysis is used to describe the directed dependencies among a set of variables. This includes models equivalent to any form of multiple regression analysis, factor analysis, canonical correlation analysis, discriminant analysis, as well as more general families of models in the multivariate analysis of variance and covariance analyses (MANOVA, ANOVA, ANCOVA).

In addition to being thought of as a form of multiple regression focusing on causality, path analysis can be viewed as a special case of structural equation modeling (SEM) – one in which only single indicators are employed for each of the variables in the causal model. That is, path analysis is SEM with a structural model, but no measurement model. Other terms used to refer to path analysis include causal modeling, analysis of covariance structures, and latent variable models.
Input and Output into R11:08
Useful Data Summary Statistics13:28
JSS Reading and Exercise #12:27
What is lavaan (up to syntax) ?14:09
Estimate an Example Confirmatory Factor Analysis (CFA)8:43
In statistics, confirmatory factor analysis (CFA) is a special form of factor analysis, most commonly used in social research.^[ It is used to test whether measures of a construct are consistent with a researcher's understanding of the nature of that construct (or factor). As such, the objective of confirmatory factor analysis is to test whether the data fit a hypothesized measurement model. This hypothesized model is based on theory and/or previous analytic research. CFA was first developed by Jöreskog and has built upon and replaced older methods of analyzing construct validity such as the MTMM Matrix as described in Campbell & Fiske (1959).

In confirmatory factor analysis, the researcher first develops a hypothesis about what factors s/he believes are underlying the measures s/he has used (e.g., "Depression" being the factor underlying the Beck Depression Inventory and the Hamilton Rating Scale for Depression) and may impose constraints on the model based on these a priori hypotheses. By imposing these constraints, the researcher is forcing the model to be consistent with his/her theory. For example, if it is posited that there are two factors accounting for the covariance in the measures, and that these factors are unrelated to one another, the researcher can create a model where the correlation between factor A and factor B is constrained to zero. Model fit measures could then be obtained to assess how well the proposed model captured the covariance between all the items or measures in the model. If the constraints the researcher has imposed on the model are inconsistent with the sample data, then the results of statistical tests of model fit will indicate a poor fit, and the model will be rejected. If the fit is poor, it may be due to some items measuring multiple factors. It might also be that some items within a factor are more related to each other than others.
Other Useful lavaan Fitted Results Functions6:38

Solution to CFA Exercise from Section 2 (part 1)16:05
Solution to CFA Exercise from Section 2 (part 2)12:00
Full SEM Political Democracy Model Example (part 1)10:40
Full SEM Political Democracy Model Example (part 2)9:40
Full SEM Political Democracy Model Example (part 3)14:45
Full SEM Quantitative Attitudes Example (part 1)11:17
Full SEM Quantitative Attitudes Example (part 2)5:55
Setting Inequalities Full SEM Exercise1:20

What is Factor Analysis ?9:40
Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in four observed variables mainly reflect the variations in two unobserved variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modelled as linear combinations of the potential factors, plus "error" terms. The information gained about the interdependencies between observed variables can be used later to reduce the set of variables in a dataset. Computationally this technique is equivalent to low-rank approximation of the matrix of observed variables. Factor analysis originated in psychometrics and is used in behavioral sciences, social sciences, marketing, product management, operations research, and other fields that deal with data sets where there are large numbers of observed variables that are are thought to reflect a small number of latent variables.

Factor analysis is related to principal component analysis (PCA), but the two are not identical. Latent variable models, including factor analysis, use regression modelling techniques to test hypotheses producing error terms, while PCA is a descriptive statistical technique. There has been significant controversy in the field over the equivalence or otherwise of the two techniques (see exploratory factor analysis versus principal components analysis).^[
Set Up Data for Factor Analysis10:17
Begin Performing Exploratory Factor Analysis (EFA)8:56
In multivariate statistics, exploratory factor analysis (EFA) is a statistical method used to uncover the underlying structure of a relatively large set of variables. EFA is a technique within factor analysis whose overarching goal is to identify the underlying relationships between measured variables. It is commonly used by researchers when developing a scale (a scale is a collection of questions used to measure a particular research topic) and serves to identify a set of latent constructs underlying a battery of measured variables. It should be used when the researcher has no a priori hypothesis about factors or patterns of measured variables. Measured variables are any one of several attributes of people that may be observed and measured. An example of a measured variable would be the physical height of a human being. Researchers must carefully consider the number of measured variables to include in the analysis. EFA procedures are more accurate when each factor is represented by multiple measured variables in the analysis.

EFA is based on the common factor model. Within the common factor model, a function of common factors, unique factors, and errors of measurements expresses measured variables. Common factors inﬂuence two or more measured variables, while each unique factor inﬂuences only one measured variable and does not explain correlations among measured variables.

EFA assumes that any indicator/measured variable may be associated with any factor. When developing a scale, researchers should use EFA first before moving on to confirmatory factor analysis (CFA). EFA requires the researcher to make a number of important decisions about how to conduct the analysis because there is no one set method.
Continue Performing Various EFAs9:15
Perform a Confirmatory Factor Analysis (CFA)8:27

Introduction to Missing Data and Imputation1:41
In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as "unit imputation"; when substituting for a component of a data point, it is known as "item imputation". Because missing data can create problems for analyzing data, imputation is seen as a way to avoid pitfalls involved with listwise deletion of cases that have missing values. That is to say, when one or more values are missing for a case, most statistical packages default to discarding any case that has a missing value, which may introduce bias or affect the representativeness of the results. Imputation preserves all cases by replacing missing data with a probable value based on other available information. Once all missing values have been imputed, the data set can then be analysed using standard techniques for complete data.
Solution to Setting Inequalities Exercise from Section 3 (part 1)10:46
Solution to Setting Inequalities Exercise from Section 3 (part 2)7:29
Solution to Setting Inequalities Exercise from Section 3 (part 3)5:36
Missing Data Problems and Issues (part 1)12:42
Missing Data Problems and Issues (part 2)8:41
Missing Data Imputation R Scripts Examples (part 1)8:35
Missing Data Imputation R Scripts Examples (part 2)11:20
More R Scripts with Modification Indices and FIML10:01
Missing Data Exercise and an Audience Question2:30

Introduction to Mediation and Indirect Effects4:38
In statistics, a mediation model is one that seeks to identify and explicate the mechanism or process that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of a third explanatory variable, known as a mediator variable. Rather than hypothesizing a direct causal relationship between the independent variable and the dependent variable, a mediational model hypothesizes that the independent variable influences the mediator variable, which in turn influences the dependent variable. Thus, the mediator variable serves to clarify the nature of the relationship between the independent and dependent variables. In other words, mediating relationships occur when a third variable plays an important role in governing the relationship between the other two variables.
Solution to Missing Data Exercise from Section 5 (part 1)8:30
Solution to Missing Data Exercise from Section 5 (part 2)8:00
Mediation Concepts (slides, part 1)11:02
Mediation Concepts (slides, part 2)6:08
First R Script Mediation Example9:24
Second R Script Mediation Example (part 1)10:22
Second R Script Mediation Example (part 2)12:50
Mediation Exercise0:32
More on the Complexity of Mediation6:24

Introduction to Estimating Group Effects and Moderation0:47
Solution to Mediation Exercise from Section 612:29
Introduction to Meanstructures; Group Effects Slides10:53
Group Analysis Functions in lavaan6:54
Groups R Script CFA Example8:49
Constraining Parameters Across Groups9:25
Multi-Group Analysis Measurement Invariance Example (part 1)9:08
Measurement invariance or measurement equivalence is a statistical property of measurement that indicates that the same construct is being measured across some specified groups. For example, measurement invariance can be used to study whether a given measure is interpreted in a conceptually similar manner by respondents representing different genders or cultural backgrounds. Violations of measurement invariance may preclude meaningful interpretation of measurement data. Tests of measurement invariance are increasingly used in fields such as psychology to supplement evaluation of measurement quality rooted in classical test theory. Measurement invariance is often tested in the framework of multiple-group confirmatory factor analysis (CFA).
Multi-Group Analysis Measurement Invariance Example (part 2)9:57

Introduction to Latent (Growth) Curve Models10:06
Latent growth modeling is a statistical technique used in the structural equation modeling (SEM) framework to estimate growth trajectory. It is a longitudinal analysis technique to estimate growth over a period of time. It is widely used in the field of behavioral science, education and social science. It is also called latent growth curve analysis. The latent growth model was derived from theories of SEM. General purpose SEM software, such as OpenMx, lavaan (both open source packages based in R), AMOS, Mplus, LISREL, or EQS among others may be used to estimate the trajectory of growth.

Latent Growth Models represent repeated measures of dependent variables as a function of time and other measures. Such longitudinal data share the features that the same subjects are observed repeatedly over time, and on the same tests (or parallel versions), and at known times. In latent growth modeling, the relative standing of an individual at each time is modeled as a function of an underlying growth process, with the best parameter values for that growth process being fitted to each individual.

These models have grown in use in social and behavioral research since it was shown that they can be fitted as a restricted common factor model in the structural equation modeling framework.
First LCM Example (part 1)11:16
Add Covariates to First LCM Example9:33
Crime Data LCM Example (part 1)11:45
Crime Data LCM Example (part 2)12:20
Crime Data LCM Example (part 3)11:06
Crime Data LCM Example (part 4)7:47
Latent Curve Models Review10:11
Review LCM Crime Models 1 and 211:56
Review LCM Crime Models 3, 4 and 58:46
Adding Covariates and MIMIC12:05
More on Covariates and Interactions11:23
Alternative Specifications of Latent Intercepts and Slopes10:01
Alternative Specification of Multigroup Models (exercise solution)12:37

Requirements

Students will be required to install no-cost R and RStudio software (instructions are provided).
Students who are new to R software will need to need to use and practice with the "introduction to R" scripts and exercises that are provided with the course's videos and materials.

Description

This "hands-on" course teaches one how to use the R software lavaan package to specify, estimate the parameters of, and interpret covariance-based structural equation (SEM) models that use latent variables. "lavaan" (note the purposeful use of lowercase "L" in 'lavaan') is an acronym for latent variable analysis, and the name suggests the long-term goal of the developer, Yves Rosseel: "to provide a collection of tools that can be used to explore, estimate, and understand a wide family of latent variable models, including factor analysis, structural equation, longitudinal, multilevel, latent class, item response, and missing data models." The course uses and executes many "live" examples (with included R scripts and datasets) using no-cost R and RStudio software to demonstrate and teach how to: (1) specify a SEM model in lavaan syntax; (2) fit and then evaluate your model; (3) perform a CFA; (4) impute and replace missing data; (5) estimate mediating and other indirect effects; (6) estimate and evaluate multigroup models, simultaneously establishing measurement invariance; and (7) specifying and estimating latent (growth) curve models, including the use of random (and latent) intercepts and slopes. The R lavaan package is world-class 'professional-grade' SEM software, used by thousands of SEM experts, graduate students, and college and university faculty around the world.

Who this course is for:

Course participants may be "brand-new" (inexperienced) to using both R software and/or SEM model estimation, or they may be more experienced in one or both techniques.
This course is very useful for graduate students, quantitative-analysis professionals, and/or for college and university faculty who analyze research data using path models characterized by latent variables.
This course is appropriate for anyone wishing to learn more about specifying, estimating and intrepreting covariance-based SEM models using the no-cost professional-grade SEM modeling features in the lavaan (and other) packages in R software.
The course is appropriate for anyone who wishes to learn how to use a no-cost, professional SEM software suite regarded as an alternative to MPlus.

Structural equation modeling (SEM) with lavaan

What you'll learn

Explore related topics

Course content

Introduction to R and to SEM using the lavaan package9 lectures • 1hr 23min

Confirmatory Factor Analysis (CFA) with lavaan9 lectures • 1hr 29min

Full SEM Models8 lectures • 1hr 22min

Factor Analysis5 lectures • 47min

Missing Data and Imputation10 lectures • 1hr 19min

Mediation and Indirect Effects10 lectures • 1hr 18min

Estimating Group Effects8 lectures • 1hr 8min

Latent (Growth) Curve Models14 lectures • 2hr 31min

Requirements

Description

Who this course is for: