Factor Analysis with Missing Data

options(digits = 3)
options(width = 100)

In this example, we will show how to use lslx to conduct semi-confirmatory factor analysis with missing data. The example uses data HolzingerSwineford1939 in the package lavaan. Hence, lavaan must be installed.

Missing Data Construction

Because HolzingerSwineford1939 doesn't contain missing values, we use the code in semTools to create NA (see the example of twostage() function in semTools).

data_miss <- lavaan::HolzingerSwineford1939
data_miss$x5 <- ifelse(data_miss$x1 <= quantile(data_miss$x1, .3), 
                       NA, data_miss$x5)
data_miss$age <- data_miss$ageyr + data_miss$agemo/12
data_miss$x9 <- ifelse(data_miss$age <= quantile(data_miss$age, .3), 
                       NA, data_miss$x9)

By the construction, we can see that the missingness of x5 depends on the value of x1 and the missingness of x9 relies on the age variable. Note that age is created by ageyr and agemo. Since ageyr and agemo are not the variables that we are interested, the two variables are treated as auxiliary in the later analysis.

Model Specification and Object Initialization

A usual confirmatory factor analysis (CFA) model is specified.

model_miss <- "visual  :=> x1 + x2 + x3
               textual :=> x4 + x5 + x6
               speed   :=> x7 + x8 + x9
               visual  <=> 1 * visual
               textual <=> 1 * textual
               speed   <=> 1 * speed"

Here, 1 before * will be interpreted as fix(1). To initialize an lslx object with auxiliary variables, we need to specify the auxiliary_variable argument. The auxiliary_variable argument only accepts numeric variables. If any categorical variable is considered as a valid auxiliary variable, user should transform it as a set of dummy variables first. One possible method is using model.matrix function.

lslx_miss <- lslx$new(model = model_miss, data = data_miss,
                      auxiliary_variable = c("ageyr", "agemo"))

Because the specified CFA might not fit the data well, we add a correlated residual structure to the model by $penalize_block()

lslx_miss$penalize_block(block = "y<->y", type = "fixed", verbose = FALSE)

The code penalizes all the coefficients in y<->y block with fixed parameter type. Note that this model is not identified under the usual SEM framework. PL method can still estimate it because the penalty function introduces additional constraints on parameters. However, we don't recommend using such type of model because it is difficult to be interpreted.

Model Fitting

So far, the specified auxiliary variables are only stored in lslx object. They are actually used after implementing the $fit() related methods.


By default, fit related methods implement two-step method (possibly with auxiliary variables) for handling missing values. User can specify the missing method explicitly via missing_method argument. Another missing method in the current version is listwise deletion. However, listwise deletion has no theoretical advantages over the two-step method.

Model Summarizing

The following code summarizes the fitting result under the penalty level selected by a Robust version of Akaike information criterion (RAIC). The number of missing patterns shows how many missing patterns present in the data set (include the complete pattern). If the lslx object is initialized via raw data, by default, a corrected sandwich standard error will be used for coefficient test. The correction is based on the asymptotic covariance of saturated moments derived by full information maximum likelihood. Also, the mean adjusted likelihood ratio test is based on this quantity. For the reference, please see the section of Missing Data in ?lslx.

lslx_miss$summarize(selector = "raic")

Try the lslx package in your browser

Any scripts or data that you put into this service are public.

lslx documentation built on April 28, 2020, 1:09 a.m.