poolSat: Fit a Saturated 'lavaan' Model to Multiple Imputed Data Sets

View source: R/pool-saturated.R

poolSatR Documentation

Fit a Saturated lavaan Model to Multiple Imputed Data Sets

Description

This function fits a saturated model to a list of imputed data sets, and returns a list of pooled summary statistics to treat as data.

Usage

poolSat(
  data,
  ...,
  return.fit = FALSE,
  scale.W = TRUE,
  omit.imps = c("no.conv", "no.se")
)

Arguments

data

A list of imputed data sets, or an object class from which imputed data can be extracted. Recognized classes are lavaan.mi (list of imputations stored in the ⁠@DataList⁠ slot), amelia (created by the Amelia package), or mids (created by the mice package).

...

Additional arguments passed to lavaan::lavCor() or to lavaan.mi().

return.fit

logical indicating whether to return a lavaan.mi object containing the results of fitting the saturated model to multiple imputed data. Could be useful for diagnostic purposes.

scale.W

logical. If TRUE (default), the within- and between-imputation components will be pooled by scaling the within-imputation component by the ARIV (see Enders, 2010, p. 235, for definition and formula). Otherwise, the pooled matrix is calculated as the weighted sum of the within-imputation and between-imputation components (see Enders, 2010, ch. 8, for details).

omit.imps

character vector specifying criteria for omitting imputations from pooled results of saturated model. Can include any of c("no.conv", "no.se", "no.npd"), the first 2 of which are the default setting, which excludes any imputations that did not converge or for which standard errors could not be computed. The last option ("no.npd") would exclude any imputations which yielded a nonpositive definite covariance matrix for observed or latent variables, which would include any "improper solutions" such as Heywood cases. NPD solutions are not excluded by default because they are likely to occur due to sampling error, especially in small samples. However, gross model misspecification could also cause NPD solutions, users can compare pooled results with and without this setting as a sensitivity analysis to see whether some imputations warrant further investigation. Specific imputation numbers can also be included in this argument, in case users want to apply their own custom omission criteria (or simulation studies can use different numbers of imputations without redundantly refitting the model).

Value

If return.fit=TRUE, a lavaan.mi object. Otherwise, an object of class lavMoments, which is a list that contains at least ⁠$sample.cov⁠ and ⁠$sample.nobs⁠, potentially also ⁠$sample.mean⁠, ⁠$sample.th⁠, ⁠$NACOV⁠, and ⁠$WLS.V⁠. Also contains ⁠$lavOptions⁠ that will be passed to lavaan(...).

Note

The ⁠$lavOptions⁠ list will always set fixed.x=FALSE and conditional.x=FALSE. Users should not override those options when calling lavaan::lavaan() because doing so would yield incorrect SEs and test statistics. Computing the correct ⁠$NACOV⁠ argument would depend on which specific variables are treated as fixed, which would require an argument to poolSat() for users to declare names of exogenous variables. This has not yet been programmed, but that feature may be added in the future in order to reduce the number of parameters to estimate. However, if "exogenous" predictors were incomplete and imputed, then they are not truly fixed (i.e., unvarying across samples), so treating them as fixed would be illogical and yield biased SEs and test statistics.

The information returned by poolSat() must assume that any fitted SEM will include all the variables in ⁠$sample.cov⁠ and (more importantly) in ⁠$NACOV⁠. Although lavaan can drop unused rows/columns from ⁠$sample.cov⁠, it cannot be expected to drop the corresponding sampling variances of those eliminated (co)variances from ⁠$NACOV⁠. Thus, it is necessary to use poolSat() to obtain the appropriate summary statistics for any particular SEM (see Examples).

Author(s)

Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)

References

Lee, T., & Cai, L. (2012). Alternative multiple imputation inference for mean and covariance structure modeling. Journal of Educational and Behavioral Statistics, 37(6), 675–702. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3102/1076998612458320")}

Chung, S., & Cai, L. (2019). Alternative multiple imputation inference for categorical structural equation modeling, Multivariate Behavioral Research, 54(3), 323–337. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/00273171.2018.1523000")}

See Also

lavaan.mi() for traditional method (fit SEM to each imputation, pool results afterward).

Examples


data(HS20imps) # import a list of 20 imputed data sets

## fit saturated model to imputations, pool those results
impSubset1 <- lapply(HS20imps, "[", i = paste0("x", 1:9)) # only modeled variables
(prePooledData <- poolSat(impSubset1))

## Note: no means were returned (default lavOption() is meanstructure=FALSE)
(prePooledData <- poolSat(impSubset1, meanstructure = TRUE))

## specify CFA model from lavaan's ?cfa help page
HS.model <- '
  visual  =~ x1 + x2 + x3
  textual =~ x4 + x5 + x6
  speed   =~ x7 + x8 + x9
'

## fit model to summary statistics in "prePooledData"
fit <- cfa(HS.model, data = prePooledData, std.lv = TRUE)
## By default, the "Scaled" column provides a "scaled.shifted" test
## statistic that maintains an approximately nominal Type I error rate.
summary(fit, fit.measures = TRUE, standardized = "std.all")
## Note that this scaled statistic does NOT account for deviations from
## normality, because the default normal-theory standard errors were
## requested when running poolSat().  See below about non-normality.

## Alternatively, "Browne's residual-based (ADF) test" is also available:
lavTest(fit, test = "browne.residual.adf", output = "text")

## Optionally, save the saturated-model lavaan.mi object, which
## could be helpful for diagnosing convergence problems per imputation.
satFit <- poolSat(impSubset1, return.fit = TRUE)


## FITTING MODELS TO DIFFERENT (SUBSETS OF) VARIABLES

## If you only want to analyze a subset of these variables,
mod.vis <- 'visual  =~ x1 + x2 + x3'
## you will get an error:
try(
  fit.vis <- cfa(mod.vis, data = prePooledData) # error
)

## As explained in the "Note" section, you must use poolSat() again for
## this subset of variables
impSubset3 <- lapply(HS20imps, "[", i = paste0("x", 1:3)) # only modeled variables
visData <- poolSat(impSubset3)
fit.vis <- cfa(mod.vis, data = visData) # no problem


## OTHER lavaan OPTIONS


## fit saturated MULIPLE-GROUP model to imputations
impSubset2 <- lapply(HS20imps, "[", i = c(paste0("x", 1:9), "school"))
(prePooledData2 <- poolSat(impSubset2, group = "school",
                           ## request standard errors that are ROBUST
                           ## to violations of the normality assumption:
                           se = "robust.sem"))
## Nonnormality-robust standard errors are implicitly incorporated into the
## pooled weight matrix (NACOV= argument), so they are
## AUTOMATICALLY applied when fitting the model:
fit.config <- cfa(HS.model, data = prePooledData2, group = "school",
                  std.lv = TRUE)
## standard errors and chi-squared test of fit both robust to nonnormality
summary(fit.config)


## CATEGORICAL OUTCOMES

## discretize the imputed data, for an example of 3-category data
HS3cat <- lapply(impSubset1, function(x) {
  as.data.frame( lapply(x, cut, breaks = 3, labels = FALSE) )
})
## pool polychoric correlations and thresholds
(prePooledData3 <- poolSat(HS3cat, ordered = paste0("x", 1:9)))

fitc <- cfa(HS.model, data = prePooledData3, std.lv = TRUE)
summary(fitc)

## Optionally, use unweighted least-squares estimation.  However,
## you must first REMOVE the pooled weight matrix (WLS.V= argument)
## or replace it with an identity matrix of the same dimensions:
prePooledData4 <- prePooledData3
prePooledData4$WLS.V <- NULL
## or prePooledData4$WLS.V <- diag(nrow(prePooledData3$WLS.V))
fitcu <- cfa(HS.model, data = prePooledData4, std.lv = TRUE, estimator = "ULS")
## Note that the SEs and test were still appropriately corrected:
summary(fitcu)



lavaan.mi documentation built on April 3, 2025, 9:36 p.m.