Structural equation modeling (SEM) with lavaan
4.2 (77 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
1,193 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Structural equation modeling (SEM) with lavaan to your Wishlist.

Add to Wishlist

Structural equation modeling (SEM) with lavaan

Learn how to specify, estimate and interpret SEM models with no-cost professional R software used by experts worldwide.
4.2 (77 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
1,193 students enrolled
Last updated 8/2015
Curiosity Sale
Current price: $10 Original price: $40 Discount: 75% off
30-Day Money-Back Guarantee
  • 11.5 hours on-demand video
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Specify and estimate parameters in a structural equation model using the R lavaan package and interpret and report on the SEM model results.
  • Perform exploratory and confirmatory factors analyses (EFAs and CFAs) using their own datasets.
  • Use a variety of multiple imputation techniques to "fill in," and correct for, missing data.
  • Specify and estimate mediated and other indirect SEM effects using traditional parametric confidence intervals, as well as using bootstrapped and/or bias-corrected and accelerated non-parametric approaches.
  • Specify and estimate the fit of multi-group SEM models, as well as determine levels of measurement invariance (metric, scalar, configural).
  • Output beautiful multi-color plots of fitted SEM models for use in reports and publications.
  • Understand how to set-up, specify, estimate and interpret a latent (growth) curve model, using alternate random intercept and slope model specifications.
View Curriculum
  • Students will be required to install no-cost R and RStudio software (instructions are provided).
  • Students who are new to R software will need to need to use and practice with the "introduction to R" scripts and exercises that are provided with the course's videos and materials.

This "hands-on" course teaches one how to use the R software lavaan package to specify, estimate the parameters of, and interpret covariance-based structural equation (SEM) models that use latent variables. "lavaan" (note the purposeful use of lowercase "L" in 'lavaan') is an acronym for latent variable analysis, and the name suggests the long-term goal of the developer, Yves Rosseel: "to provide a collection of tools that can be used to explore, estimate, and understand a wide family of latent variable models, including factor analysis, structural equation, longitudinal, multilevel, latent class, item response, and missing data models." The course uses and executes many "live" examples (with included R scripts and datasets) using no-cost R and RStudio software to demonstrate and teach how to: (1) specify a SEM model in lavaan syntax; (2) fit and then evaluate your model; (3) perform a CFA; (4) impute and replace missing data; (5) estimate mediating and other indirect effects; (6) estimate and evaluate multigroup models, simultaneously establishing measurement invariance; and (7) specifying and estimating latent (growth) curve models, including the use of random (and latent) intercepts and slopes. The R lavaan package is world-class 'professional-grade' SEM software, used by thousands of SEM experts, graduate students, and college and university faculty around the world.

Who is the target audience?
  • Course participants may be "brand-new" (inexperienced) to using both R software and/or SEM model estimation, or they may be more experienced in one or both techniques.
  • This course is very useful for graduate students, quantitative-analysis professionals, and/or for college and university faculty who analyze research data using path models characterized by latent variables.
  • This course is appropriate for anyone wishing to learn more about specifying, estimating and intrepreting covariance-based SEM models using the no-cost professional-grade SEM modeling features in the lavaan (and other) packages in R software.
  • The course is appropriate for anyone who wishes to learn how to use a no-cost, professional SEM software suite regarded as an alternative to MPlus.
Students Who Viewed This Course Also Viewed
Curriculum For This Course
73 Lectures
Introduction to R and to SEM using the lavaan package
9 Lectures 01:22:38

R is a programming language and software environment for statistical computing and graphics. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, surveys of data miners, and studies of scholarly literature databases show that R's popularity has increased substantially in recent years.

R is an implementation of the S programming language combined with lexical scoping semantics inspired by Scheme. S was created by John Chambers while at Bell Labs. There are some important differences, but much of the code written for S runs unaltered.

R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors and partly as a play on the name of S.

R is a GNU project. The source code for the R software environment is written primarily in C, Fortran, and R. R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems. R uses a command line interface; there are also several graphical front-ends for it.

Preview 07:04

Structural equation modeling (SEM) is a family of statistical methods designed to test a conceptual or theoretical model.[ Some common SEM methods include confirmatory factor analysis, path analysis, and latent growth modeling.[ The term "structural equation model" most commonly refers to a combination of two things: a "measurement model" that defines latent variables using one or more observed variables, and a "structural regression model" that links latent variables together. The parts of a structural equation model are linked to one another using a system of simultaneous regression equations.

SEM is widely used in the social sciences because of its ability to isolate observational error from measurement of latent variables. To provide a simple example, the concept of human intelligence cannot be measured directly as one could measure height or weight. Instead, psychologists develop theories of intelligence and write measurement instruments with items (questions) designed to measure intelligence according to their theory. They would then use SEM to test their theory using data gathered from people who took their intelligence test. With SEM, "intelligence" would be the latent variable and the test items would be the observed variables.

Preview 09:31

In statistics, path analysis is used to describe the directed dependencies among a set of variables. This includes models equivalent to any form of multiple regression analysis, factor analysis, canonical correlation analysis, discriminant analysis, as well as more general families of models in the multivariate analysis of variance and covariance analyses (MANOVA, ANOVA, ANCOVA).

In addition to being thought of as a form of multiple regression focusing on causality, path analysis can be viewed as a special case of structural equation modeling (SEM) – one in which only single indicators are employed for each of the variables in the causal model. That is, path analysis is SEM with a structural model, but no measurement model. Other terms used to refer to path analysis include causal modeling, analysis of covariance structures, and latent variable models.

Introduction to Path Modeling and SEM (slides, part 2)

Useful Data Summary Statistics

JSS Reading and Exercise #1

In statistics, confirmatory factor analysis (CFA) is a special form of factor analysis, most commonly used in social research.[ It is used to test whether measures of a construct are consistent with a researcher's understanding of the nature of that construct (or factor). As such, the objective of confirmatory factor analysis is to test whether the data fit a hypothesized measurement model. This hypothesized model is based on theory and/or previous analytic research. CFA was first developed by Jöreskog and has built upon and replaced older methods of analyzing construct validity such as the MTMM Matrix as described in Campbell & Fiske (1959).

In confirmatory factor analysis, the researcher first develops a hypothesis about what factors s/he believes are underlying the measures s/he has used (e.g., "Depression" being the factor underlying the Beck Depression Inventory and the Hamilton Rating Scale for Depression) and may impose constraints on the model based on these a priori hypotheses. By imposing these constraints, the researcher is forcing the model to be consistent with his/her theory. For example, if it is posited that there are two factors accounting for the covariance in the measures, and that these factors are unrelated to one another, the researcher can create a model where the correlation between factor A and factor B is constrained to zero. Model fit measures could then be obtained to assess how well the proposed model captured the covariance between all the items or measures in the model. If the constraints the researcher has imposed on the model are inconsistent with the sample data, then the results of statistical tests of model fit will indicate a poor fit, and the model will be rejected. If the fit is poor, it may be due to some items measuring multiple factors. It might also be that some items within a factor are more related to each other than others.

Estimate an Example Confirmatory Factor Analysis (CFA)

Other Useful lavaan Fitted Results Functions
Confirmatory Factor Analysis (CFA) with lavaan
9 Lectures 01:28:49
Exercise Solutions from Section 1

SEM Review (slides, part 2)

SEM Review (slides, part 3)

Run CFA in R Script (part 2)

Run CFA in R Script (part 3)

Run CFA in R Script (part 4)

CFA Exercise
Full SEM Models
8 Lectures 01:21:42
Solution to CFA Exercise from Section 2 (part 1)

Solution to CFA Exercise from Section 2 (part 2)

Full SEM Political Democracy Model Example (part 2)

Full SEM Political Democracy Model Example (part 3)

Full SEM Quantitative Attitudes Example (part 1)

Full SEM Quantitative Attitudes Example (part 2)

Setting Inequalities Full SEM Exercise
Factor Analysis
5 Lectures 46:35

Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in four observed variables mainly reflect the variations in two unobserved variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modelled as linear combinations of the potential factors, plus "error" terms. The information gained about the interdependencies between observed variables can be used later to reduce the set of variables in a dataset. Computationally this technique is equivalent to low-rank approximation of the matrix of observed variables. Factor analysis originated in psychometrics and is used in behavioral sciences, social sciences, marketing, product management, operations research, and other fields that deal with data sets where there are large numbers of observed variables that are are thought to reflect a small number of latent variables.

Factor analysis is related to principal component analysis (PCA), but the two are not identical. Latent variable models, including factor analysis, use regression modelling techniques to test hypotheses producing error terms, while PCA is a descriptive statistical technique. There has been significant controversy in the field over the equivalence or otherwise of the two techniques (see exploratory factor analysis versus principal components analysis).[

Preview 09:40

In multivariate statistics, exploratory factor analysis (EFA) is a statistical method used to uncover the underlying structure of a relatively large set of variables. EFA is a technique within factor analysis whose overarching goal is to identify the underlying relationships between measured variables. It is commonly used by researchers when developing a scale (a scale is a collection of questions used to measure a particular research topic) and serves to identify a set of latent constructs underlying a battery of measured variables. It should be used when the researcher has no a priori hypothesis about factors or patterns of measured variables. Measured variables are any one of several attributes of people that may be observed and measured. An example of a measured variable would be the physical height of a human being. Researchers must carefully consider the number of measured variables to include in the analysis. EFA procedures are more accurate when each factor is represented by multiple measured variables in the analysis.

EFA is based on the common factor model. Within the common factor model, a function of common factors, unique factors, and errors of measurements expresses measured variables. Common factors influence two or more measured variables, while each unique factor influences only one measured variable and does not explain correlations among measured variables.

EFA assumes that any indicator/measured variable may be associated with any factor. When developing a scale, researchers should use EFA first before moving on to confirmatory factor analysis (CFA). EFA requires the researcher to make a number of important decisions about how to conduct the analysis because there is no one set method.

Begin Performing Exploratory Factor Analysis (EFA)

Continue Performing Various EFAs

Missing Data and Imputation
10 Lectures 01:19:21

In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as "unit imputation"; when substituting for a component of a data point, it is known as "item imputation". Because missing data can create problems for analyzing data, imputation is seen as a way to avoid pitfalls involved with listwise deletion of cases that have missing values. That is to say, when one or more values are missing for a case, most statistical packages default to discarding any case that has a missing value, which may introduce bias or affect the representativeness of the results. Imputation preserves all cases by replacing missing data with a probable value based on other available information. Once all missing values have been imputed, the data set can then be analysed using standard techniques for complete data.

Preview 01:41

Solution to Setting Inequalities Exercise from Section 3 (part 2)

Solution to Setting Inequalities Exercise from Section 3 (part 3)

Missing Data Problems and Issues (part 1)

Missing Data Problems and Issues (part 2)

Missing Data Imputation R Scripts Examples (part 1)

Missing Data Imputation R Scripts Examples (part 2)

More R Scripts with Modification Indices and FIML

Missing Data Exercise and an Audience Question
Mediation and Indirect Effects
10 Lectures 01:17:50

In statistics, a mediation model is one that seeks to identify and explicate the mechanism or process that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of a third explanatory variable, known as a mediator variable. Rather than hypothesizing a direct causal relationship between the independent variable and the dependent variable, a mediational model hypothesizes that the independent variable influences the mediator variable, which in turn influences the dependent variable. Thus, the mediator variable serves to clarify the nature of the relationship between the independent and dependent variables. In other words, mediating relationships occur when a third variable plays an important role in governing the relationship between the other two variables.

Preview 04:38

Solution to Missing Data Exercise from Section 5 (part 1)

Solution to Missing Data Exercise from Section 5 (part 2)

Mediation Concepts (slides, part 1)

Mediation Concepts (slides, part 2)

Second R Script Mediation Example (part 2)

Mediation Exercise

More on the Complexity of Mediation
Estimating Group Effects
8 Lectures 01:08:22
Introduction to Estimating Group Effects and Moderation

Solution to Mediation Exercise from Section 6

Group Analysis Functions in lavaan

Constraining Parameters Across Groups

Measurement invariance or measurement equivalence is a statistical property of measurement that indicates that the same construct is being measured across some specified groups. For example, measurement invariance can be used to study whether a given measure is interpreted in a conceptually similar manner by respondents representing different genders or cultural backgrounds. Violations of measurement invariance may preclude meaningful interpretation of measurement data. Tests of measurement invariance are increasingly used in fields such as psychology to supplement evaluation of measurement quality rooted in classical test theory. Measurement invariance is often tested in the framework of multiple-group confirmatory factor analysis (CFA).

Multi-Group Analysis Measurement Invariance Example (part 1)

Multi-Group Analysis Measurement Invariance Example (part 2)
Latent (Growth) Curve Models
14 Lectures 02:30:52

Latent growth modeling is a statistical technique used in the structural equation modeling (SEM) framework to estimate growth trajectory. It is a longitudinal analysis technique to estimate growth over a period of time. It is widely used in the field of behavioral science, education and social science. It is also called latent growth curve analysis. The latent growth model was derived from theories of SEM. General purpose SEM software, such as OpenMx, lavaan (both open source packages based in R), AMOS, Mplus, LISREL, or EQS among others may be used to estimate the trajectory of growth.

Latent Growth Models represent repeated measures of dependent variables as a function of time and other measures. Such longitudinal data share the features that the same subjects are observed repeatedly over time, and on the same tests (or parallel versions), and at known times. In latent growth modeling, the relative standing of an individual at each time is modeled as a function of an underlying growth process, with the best parameter values for that growth process being fitted to each individual.

These models have grown in use in social and behavioral research since it was shown that they can be fitted as a restricted common factor model in the structural equation modeling framework.

Introduction to Latent (Growth) Curve Models

First LCM Example (part 1)

Add Covariates to First LCM Example

Crime Data LCM Example (part 1)

Crime Data LCM Example (part 2)

Crime Data LCM Example (part 3)

Crime Data LCM Example (part 4)

Review LCM Crime Models 1 and 2

Adding Covariates and MIMIC

Alternative Specification of Multigroup Models (exercise solution)
About the Instructor
Geoffrey Hubona, Ph.D.
4.0 Average rating
1,411 Reviews
12,017 Students
28 Courses
Professor of Information Systems

Dr. Geoffrey Hubona held full-time tenure-track, and tenured, assistant and associate professor faculty positions at 3 major state universities in the Eastern United States from 1993-2010. In these positions, he taught dozens of various statistics, business information systems, and computer science courses to undergraduate, master's and Ph.D. students. He earned a Ph.D. in Business Administration (Information Systems and Computer Science) from the University of South Florida (USF) in Tampa, FL; an MA in Economics, also from USF; an MBA in Finance from George Mason University in Fairfax, VA; and a BA in Psychology from the University of Virginia in Charlottesville, VA. He is the founder of the Georgia R School (2010-2014) and of R-Courseware (2014-Present), online educational organizations that teach research methods and quantitative analysis techniques. These research methods techniques include linear and non-linear modeling, multivariate methods, data mining, programming and simulation, and structural equation modeling and partial least squares (PLS) path modeling.