In ecmerkle/blavaan: Bayesian Latent Variable Analysis

library(blavaan, quietly=TRUE)
library(lavaan, quietly=TRUE)

Introduction

In SEM, one of the first steps is to evaluate the model's global fit. After global fit, we need to evaluate the local fit of a model, meaning how the model reproduces specific correlations between observed variables.

There are a couple of common methods for this, (a) testing for high residual correlations, or (b) modification indices. This tutorial focuses on the second. Modification indices test the likely change in the model fit if a single parameter is added to the model that was not originally included. This test can be carried out for every possible parameter that was not included [@bentler_fit_1990].

Modification Indices

Modification indices present different indices to quantify the effect of each parameter, and we will focus on two here. These are (a) the modification index (MI) or Lagrange multiplier, which estimates the extent to which the model’s chi-square ($\chi^2$) test statistic would decrease if a parameter were added to the model and freely estimated, and (b) standardized expected parameter change (SEPC), which is the approximated standardized value of the parameter if it were to be estimated in the model [@whittaker_using_2012; @garniervillarreal_evaluating_2024].

MI presents the possible effect on the overall model, and SEPC presents the effect size for the missed parameter.

We will show an example with the @holswi39 model. You first estimate your SEM/CFA model as usual

HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

fit <- bcfa(HS.model, data=HolzingerSwineford1939, std.lv=TRUE, seed=866)

HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

fit <- bcfa(HS.model, data=HolzingerSwineford1939, std.lv=TRUE)

Then we would need to write a discrepancy function to collect the modification indices. The list below contains two functions that estimate and save the MI and SEPC.

discFUN <- list(
  mod.ind_mi = function(object){
    temp <- modificationindices(object, free.remove = F)
    mods <- temp$mi
    names(mods) <- paste0(temp$lhs, temp$op, temp$rhs)
    return(mods)
  },
  mod.ind_sepc.all = function(object){
    temp <- modificationindices(object, free.remove = F)
    sepc.all <- temp$sepc.all
    names(sepc.all) <- paste0(temp$lhs, temp$op, temp$rhs)
    return(sepc.all)
  }
)

Then we will pass this function to the ppmc() function of blavaan. With this function, the MI and SEPC are computed for each posterior sample, leading to posterior distributions for each of them.

out <- ppmc(fit, discFUN = discFUN)

out <- ppmc(fit, discFUN = discFUN)

Then we view the top 5 parameters arrange by the posterior mean (EAP) MI, which in this case shows that the parameter having the highest impact in overall model fit (according to EAP) is visual=~x9, the cross-loading from the Visual factor to item x9.

summary(out, prob=.9, discFUN = "mod.ind_mi", sort.by="EAP", decreasing=T)[1:5,]

But according to the posterior median, the parameter that would have the highest impact would be the residual correlation between indicators x7 and x8

summary(out, prob=.9, discFUN = "mod.ind_mi", sort.by="Median", decreasing=T)[1:5,]

The MI is still recommended as the best metric to indicate which parameter is best to include next, and we can use the SEPC to evaluate the likely effect size for the respective parameters.

summary(out, prob=.9, discFUN = "mod.ind_sepc.all", sort.by="EAP", decreasing=T)[1:5,]

tmptab <- summary(out, prob=.9, discFUN = "mod.ind_sepc.all", sort.by="EAP", decreasing=T)[1:5,]

Here we see that for the 2 highest parameters, the likely SEPC is r paste0('x7~~x8 = ', tmptab['x7~~x8', 'EAP']) and r paste0('visual=~x9 = ', tmptab['visual=~x9', 'EAP']). With this information we can decide to include one of these new parameters in the model (one at the time). For this example, because factor loadings have a larger impact on the model-implied covariance matrix, I would choose visual=~x9

HS.model <- ' visual  =~ x1 + x2 + x3 + x9
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

fit2 <- bcfa(HS.model, data=HolzingerSwineford1939, std.lv=TRUE)

HS.model <- ' visual  =~ x1 + x2 + x3 + x9
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

fit2 <- bcfa(HS.model, data=HolzingerSwineford1939, std.lv=TRUE)

And you can check if the added parameter has the expected impact on overall fit with the blavFitIndices() and the summary() functions.

It is important to consider also the theoretical relevance of the suggested parameters, and to ensure that they make sense, instead of just adding parameters until having good fit.

Summary

You can see more details about the application an test of these indices in Bayesian SEM in @garniervillarreal_evaluating_2024.

In this tutorial we show how to calculate the MI and SEPC across posterior distributions, and evaluate which parameters can be added.

With the ppmc() function we are able to calculate relevant information after model estimation, and build posterior distributions of them.

The general recommendations are to use MI to identify the most likely parameter to add, and SEPC as the effect size of the new parameter [@garniervillarreal_evaluating_2024].