View source: R/pool-saturated.R
poolSat | R Documentation |
lavaan
Model to Multiple Imputed Data SetsThis function fits a saturated model to a list of imputed data sets, and returns a list of pooled summary statistics to treat as data.
poolSat(
data,
...,
return.fit = FALSE,
scale.W = TRUE,
omit.imps = c("no.conv", "no.se")
)
data |
A |
... |
Additional arguments passed to |
return.fit |
|
scale.W |
|
omit.imps |
|
If return.fit=TRUE
, a lavaan.mi object.
Otherwise, an object of class lavMoments
, which is a list
that contains at least $sample.cov
and $sample.nobs
,
potentially also $sample.mean
, $sample.th
, $NACOV
,
and $WLS.V
. Also contains $lavOptions
that will be passed
to lavaan(...)
.
The $lavOptions
list will always set fixed.x=FALSE
and
conditional.x=FALSE
. Users should not override those options when
calling lavaan::lavaan()
because doing so would yield
incorrect SEs and test statistics. Computing the correct
$NACOV
argument would depend on which specific variables are
treated as fixed, which would require an argument to poolSat()
for
users to declare names of exogenous variables. This has not yet been
programmed, but that feature may be added in the future in order to reduce
the number of parameters to estimate.
However, if "exogenous" predictors were incomplete and imputed, then they
are not truly fixed (i.e., unvarying across samples), so treating them as
fixed would be illogical and yield biased SEs and test statistics.
The information returned by poolSat()
must assume that any fitted
SEM will include all the variables in $sample.cov
and (more
importantly) in $NACOV
. Although lavaan
can drop unused
rows/columns from $sample.cov
, it cannot be expected to drop the
corresponding sampling variances of those eliminated (co)variances from
$NACOV
. Thus, it is necessary to use poolSat()
to obtain
the appropriate summary statistics for any particular SEM (see Examples).
Terrence D. Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)
Lee, T., & Cai, L. (2012). Alternative multiple imputation inference for mean and covariance structure modeling. Journal of Educational and Behavioral Statistics, 37(6), 675–702. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3102/1076998612458320")}
Chung, S., & Cai, L. (2019). Alternative multiple imputation inference for categorical structural equation modeling, Multivariate Behavioral Research, 54(3), 323–337. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/00273171.2018.1523000")}
lavaan.mi()
for traditional method (fit SEM to each imputation,
pool results afterward).
data(HS20imps) # import a list of 20 imputed data sets
## fit saturated model to imputations, pool those results
impSubset1 <- lapply(HS20imps, "[", i = paste0("x", 1:9)) # only modeled variables
(prePooledData <- poolSat(impSubset1))
## Note: no means were returned (default lavOption() is meanstructure=FALSE)
(prePooledData <- poolSat(impSubset1, meanstructure = TRUE))
## specify CFA model from lavaan's ?cfa help page
HS.model <- '
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
## fit model to summary statistics in "prePooledData"
fit <- cfa(HS.model, data = prePooledData, std.lv = TRUE)
## By default, the "Scaled" column provides a "scaled.shifted" test
## statistic that maintains an approximately nominal Type I error rate.
summary(fit, fit.measures = TRUE, standardized = "std.all")
## Note that this scaled statistic does NOT account for deviations from
## normality, because the default normal-theory standard errors were
## requested when running poolSat(). See below about non-normality.
## Alternatively, "Browne's residual-based (ADF) test" is also available:
lavTest(fit, test = "browne.residual.adf", output = "text")
## Optionally, save the saturated-model lavaan.mi object, which
## could be helpful for diagnosing convergence problems per imputation.
satFit <- poolSat(impSubset1, return.fit = TRUE)
## FITTING MODELS TO DIFFERENT (SUBSETS OF) VARIABLES
## If you only want to analyze a subset of these variables,
mod.vis <- 'visual =~ x1 + x2 + x3'
## you will get an error:
try(
fit.vis <- cfa(mod.vis, data = prePooledData) # error
)
## As explained in the "Note" section, you must use poolSat() again for
## this subset of variables
impSubset3 <- lapply(HS20imps, "[", i = paste0("x", 1:3)) # only modeled variables
visData <- poolSat(impSubset3)
fit.vis <- cfa(mod.vis, data = visData) # no problem
## OTHER lavaan OPTIONS
## fit saturated MULIPLE-GROUP model to imputations
impSubset2 <- lapply(HS20imps, "[", i = c(paste0("x", 1:9), "school"))
(prePooledData2 <- poolSat(impSubset2, group = "school",
## request standard errors that are ROBUST
## to violations of the normality assumption:
se = "robust.sem"))
## Nonnormality-robust standard errors are implicitly incorporated into the
## pooled weight matrix (NACOV= argument), so they are
## AUTOMATICALLY applied when fitting the model:
fit.config <- cfa(HS.model, data = prePooledData2, group = "school",
std.lv = TRUE)
## standard errors and chi-squared test of fit both robust to nonnormality
summary(fit.config)
## CATEGORICAL OUTCOMES
## discretize the imputed data, for an example of 3-category data
HS3cat <- lapply(impSubset1, function(x) {
as.data.frame( lapply(x, cut, breaks = 3, labels = FALSE) )
})
## pool polychoric correlations and thresholds
(prePooledData3 <- poolSat(HS3cat, ordered = paste0("x", 1:9)))
fitc <- cfa(HS.model, data = prePooledData3, std.lv = TRUE)
summary(fitc)
## Optionally, use unweighted least-squares estimation. However,
## you must first REMOVE the pooled weight matrix (WLS.V= argument)
## or replace it with an identity matrix of the same dimensions:
prePooledData4 <- prePooledData3
prePooledData4$WLS.V <- NULL
## or prePooledData4$WLS.V <- diag(nrow(prePooledData3$WLS.V))
fitcu <- cfa(HS.model, data = prePooledData4, std.lv = TRUE, estimator = "ULS")
## Note that the SEs and test were still appropriately corrected:
summary(fitcu)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.