twostage: Fit a lavaan model using 2-Stage Maximum Likelihood (TSML)...

View source: R/TSML.R

twostageR Documentation

Fit a lavaan model using 2-Stage Maximum Likelihood (TSML) estimation for missing data.


This function automates 2-Stage Maximum Likelihood (TSML) estimation, optionally with auxiliary variables. Step 1 involves fitting a saturated model to the partially observed data set (to variables in the hypothesized model as well as auxiliary variables related to missingness). Step 2 involves fitting the hypothesized model to the model-implied means and covariance matrix (also called the "EM" means and covariance matrix) as if they were complete data. Step 3 involves correcting the Step-2 standard errors (SEs) and chi-squared statistic to account for additional uncertainty due to missing data (using information from Step 1; see References section for sources with formulas).


twostage(..., aux, fun, baseline.model = NULL)

lavaan.2stage(..., aux = NULL, baseline.model = NULL)

cfa.2stage(..., aux = NULL, baseline.model = NULL)

sem.2stage(..., aux = NULL, baseline.model = NULL)

growth.2stage(..., aux = NULL, baseline.model = NULL)



Arguments passed to the lavaan function specified in the fun argument. See also lavOptions. At a minimum, the user must supply the first two named arguments to lavaan (i.e., model and data).


An optional character vector naming auxiliary variable(s) in data


The character string naming the lavaan function used to fit the Step-2 hypothesized model ("cfa", "sem", "growth", or "lavaan").


An optional character string, specifying the lavaan model.syntax for a user-specified baseline model. Interested users can use the fitted baseline model to calculate incremental fit indices (e.g., CFI and TLI) using the corrected chi-squared values (see the anova method in twostage). If NULL, the default "independence model" (i.e., freely estimated means and variances, but all covariances constrained to zero) will be specified internally.


All variables (including auxiliary variables) are treated as endogenous varaibles in the Step-1 saturated model (fixed.x = FALSE), so data are assumed continuous, although not necessarily multivariate normal (dummy-coded auxiliary variables may be included in Step 1, but categorical endogenous variables in the Step-2 hypothesized model are not allowed). To avoid assuming multivariate normality, request se = "robust.huber.white". CAUTION: In addition to setting fixed.x = FALSE and conditional.x = FALSE in lavaan, this function will automatically set meanstructure = TRUE, estimator = "ML", missing = "fiml", and test = "standard". lavaan's se option can only be set to "standard" to assume multivariate normality or to "robust.huber.white" to relax that assumption.


The twostage object contains 3 fitted lavaan models (saturated, target/hypothesized, and baseline) as well as the names of auxiliary variables. None of the individual models provide the correct model results (except the point estimates in the target model are unbiased). Use the methods in twostage to extract corrected SEs and test statistics.


Terrence D. Jorgensen (University of Amsterdam;


Savalei, V., & Bentler, P. M. (2009). A two-stage approach to missing data: Theory and application to auxiliary variables. Structural Equation Modeling, 16(3), 477–497. doi: 10.1080/10705510903008238

Savalei, V., & Falk, C. F. (2014). Robust two-stage approach outperforms robust full information maximum likelihood with incomplete nonnormal data. Structural Equation Modeling, 21(2), 280–302. doi: 10.1080/10705511.2014.882692

See Also



## impose missing data for example
HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""),
HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5)
age <- HSMiss$ageyr + HSMiss$agemo/12
HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9)

## specify CFA model from lavaan's ?cfa help page
HS.model <- '
  visual  =~ x1 + x2 + x3
  textual =~ x4 + x5 + x6
  speed   =~ x7 + x8 + x9

## use ageyr and agemo as auxiliary variables
out <- cfa.2stage(model = HS.model, data = HSMiss, aux = c("ageyr","agemo"))

## two versions of a corrected chi-squared test results are shown
## see Savalei & Bentler (2009) and Savalei & Falk (2014) for details

## the summary additionally provides the parameter estimates with corrected
## standard errors, test statistics, and confidence intervals, along with
## any other options that can be passed to parameterEstimates()
summary(out, standardized = TRUE)

## use parameter labels to fit a more constrained model
modc <- '
  visual  =~ x1 + x2 + x3
  textual =~ x4 + x5 + x6
  speed   =~ x7 + a*x8 + a*x9
outc <- cfa.2stage(model = modc, data = HSMiss, aux = c("ageyr","agemo"))

## use the anova() method to test this constraint
anova(out, outc)
## like for a single model, two corrected statistics are provided

semTools documentation built on May 10, 2022, 9:05 a.m.