options(digits = 3) options(width = 100)
In this example, we will show how to use lslx
to conduct semi-confirmatory factor analysis with missing data.
The example uses data HolzingerSwineford1939
in the package lavaan
.
Hence, lavaan
must be installed.
Because HolzingerSwineford1939
doesn't contain missing values, we use the code in semTools
to create NA
(see the example of twostage()
function in semTools
).
data_miss <- lavaan::HolzingerSwineford1939 data_miss$x5 <- ifelse(data_miss$x1 <= quantile(data_miss$x1, .3), NA, data_miss$x5) data_miss$age <- data_miss$ageyr + data_miss$agemo/12 data_miss$x9 <- ifelse(data_miss$age <= quantile(data_miss$age, .3), NA, data_miss$x9)
By the construction, we can see that the missingness of x5
depends on the value of x1
and the missingness of x9
relies on the age
variable. Note that age
is created by ageyr
and agemo
.
Since ageyr
and agemo
are not the variables that we are interested, the two variables are treated as auxiliary in the later analysis.
A usual confirmatory factor analysis (CFA) model is specified.
model_miss <- "visual :=> x1 + x2 + x3 textual :=> x4 + x5 + x6 speed :=> x7 + x8 + x9 visual <=> 1 * visual textual <=> 1 * textual speed <=> 1 * speed"
Here, 1
before *
will be interpreted as fix(1)
.
To initialize an lslx
object with auxiliary variables, we need to specify the auxiliary_variable
argument. The auxiliary_variable
argument only accepts numeric variables.
If any categorical variable is considered as a valid auxiliary variable, user should transform it as a set of dummy variables first. One possible method is using model.matrix
function.
library(lslx) lslx_miss <- lslx$new(model = model_miss, data = data_miss, auxiliary_variable = c("ageyr", "agemo"))
Because the specified CFA might not fit the data well, we add a correlated residual structure to the model by $penalize_block()
lslx_miss$penalize_block(block = "y<->y", type = "fixed", verbose = FALSE)
The code penalizes all the coefficients in y<->y
block with fixed parameter type.
Note that this model is not identified under the usual SEM framework.
PL method can still estimate it because the penalty function introduces additional constraints on parameters.
However, we don't recommend using such type of model because it is difficult to be interpreted.
So far, the specified auxiliary variables are only stored in lslx
object.
They are actually used after implementing the $fit()
related methods.
lslx_miss$fit_lasso()
By default, fit
related methods implement two-step method (possibly with auxiliary variables) for handling missing values. User can specify the missing method explicitly via missing_method
argument.
Another missing method in the current version is listwise deletion. However, listwise deletion has no theoretical advantages over the two-step method.
The following code summarizes the fitting result under the penalty level selected by a Robust version of Akaike information criterion (RAIC).
The number of missing patterns
shows how many missing patterns present in the data set (include the complete pattern).
If the lslx
object is initialized via raw data, by default, a corrected sandwich standard error will be used for coefficient test.
The correction is based on the asymptotic covariance of saturated moments derived by full information maximum likelihood.
Also, the mean adjusted likelihood ratio test is based on this quantity.
For the reference, please see the section of Missing Data in ?lslx
.
lslx_miss$summarize(selector = "raic")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.