In vankesteren/efast: Exploratory Factor Analysis with Structured Residuals

Introduction

options(width = 70)

library(efast)
library(corrplot)

In this tutorial, we show how to run an exploratory factor analysis with structured residuals (EFAST) model on a dataset with estimated volume on 68 regions of interest (ROIs) in both hemispheres. Data of 647 participants from the Cam-CAN cohort[^camcan] is included in the efast package -- synthesized through the synthpop package[^synthpop] to preserve individual subject's privacy. The data can be loaded as follows:

data("roi_volume")

Upon plotting the correlation matrix of this dataset, we can see that there are secondary diagonals. These indicate symmetry structure: the first 34 variables are the left-hemisphere ROIs, and the last 34 variables are their contralateral homologues. The strong secondary diagonals show that the contralateral homologues correlate strongly in their amount of volume. The correlation matrix can be plotted using the cplot() function as follows (cplot() is based on the corrplot package[^corrplot]):

cplot(cor(roi_volume))

The goal of the EFAST routine is to extract $M$ factors that explain the correlations among the ROIs as well as possible, while taking into account the symmetry. The resulting factors can then be used for further analysis, for example to compare different groups in their structural volume.

Estimating an EFAST model

The efast package is a wrapper around the lavaan package[^lavaan] with a convenient interface for EFAST models. In addition, it adds some detailed constraints on the parameters, specifically for the EFAST model, that aid in estimation. The parameters that need to be entered by the researcher are:

data The dataset to be analyzed
M The number of factors to extract
lh_idx The column indices of the left-hemisphere variables in the data
rh_idx The column indices of the right-hemisphere variables in the data

An EFAST model for the volume ROI data with 6 factors can thus be computed as follows:

efast_fit <- efast_hemi(
  data   = roi_volume,
  M      = 6,
  lh_idx = 1:34,
  rh_idx = 35:68
)

efast_fit <- readRDS("efast_fit.rds")

The function allows for other arguments to be passed on to the lavaan function as well, to change many aspects of model estimation (see ?lavoptions for more information). Most notably, the rotation can be changed here as well, if a different rotation than the default oblique geomin rotation is desired. The resulting object also inherits from the lavaan class, meaning methods such as summary() and print() are, by default, implemented in this object.

Model fit and model comparison

Since an EFAST model is a structural equation model, the entire range of fit metrics is available (e.g, $\chi^2$, AIC, BIC, CFI, TLI, RMSEA). They can be investigated using the fitmeasures() command from lavaan:

fitmeasures(efast_fit)

For example, we can see in the RMSEA value of r fitmeasures(efast_fit)["rmsea"] that the model fit is quite good -- the model approximates the elements of the observed covariance matrix well. Additionally, the fit of different models with different numbers of factors can also be investigated to determine the optimal number of factors. Another model comparison that can be made is to a model without symmetry structure. This can be done in several ways, for example through the relative fit indices such as BIC and AIC, or through likelihood ratio tests.

Below we compare the EFAST model to an identically estimated EFA model with 6 factors.

efa_fit <- efast_efa(data = roi_volume, M = 6)

efa_fit <- readRDS("efa_fit.rds")

A likelihood ratio test can be performed via the anova() or the lavTestLRT() function from the lavaan package:

lavTestLRT(efast_fit, efa_fit)

The resulting table shows that the EFAST model with 6 factors fits significantly better than the pure EFA model with 6 factors: $\chi^2$(34) = 2840.7, p < .001.

Dimension reduction

Two relevant methods for performing dimension reduction through EFAST are predict() for extracting the per-participant factor values for further analysis, and the efast_loadings() function, for extracting and displaying the factor loadings to give an interpretation to each of the factors. The loadings can optionally be displayed in a convenient symmetric way, with the loadings of the left and right hemispheres for each ROI side-by-side (note that while this aids interpretation of the loadings themselves, such as easily showing that factor 2 is relatively assymmetric, the bottom table with the proportions of explained variance now displays wrong values):

lds <- efast_loadings(efast_fit, symmetry = TRUE)

lds <- structure(efast_loadings(efast_fit, symmetry = TRUE)[,1:8], class = "loadings")

# we only show the loadings for the first 4 factors to ensure it fits
print(lds, cutoff = 0.3)

Below, the factors scores for each person are stored. The predict() function generates values (factor scores) for all latent variables in the model, and the first 6 of these are the EFA factors. As an example of further analysis, these scores can be used to generate 2 latent brain volume subgroups using k-means clustering: one group with relatively low volume, and one with relatively high volume.

pred <- predict(efast_fit)[,1:6]
clus <- kmeans(pred, 2)
clus

We can then plot the scores of the participants on the first two factors and show the clustering by using colours:

plot(pred[,c(1, 2)], bg = c("magenta", "orange")[clus$cluster],
     pch = 21, asp = 1, bty = "L")

Lateralization

The ROI method factors themselves can be used to determine per ROI the amount of symmetry and, conversely, the amount of lateralization.

Through including lateralization index coefficients in the model code (see extending the EFAST model below), the efast package provides a nice convenience function for showing how lateralized each ROI is, with automatic standard errors based on the delta method. These standard errors can also be based on bootstrapping, for example, by changing the efast_hemi() call to include se = "bootstrap". These lateralization indices -- indicating how much residual variance in the observed variables cannot be explained by the symmetry structure -- are computed for each ROI as \begin{equation} \label{eq:li} LI_i = 1 - \text{cor}(u^{lh}_i, u^{rh}_i) \end{equation} where $u^{lh}_i$ and $u^{rh}_i$ are residuals given the trait factors of interest of the $i^{th}$ ROI in the left and right hemisphere, respectively. The correlation $\text{cor}(\cdot\,, \cdot)$ between these residuals represents the amount of symmetry, so the $LI_i$ represents the \textit{residual dissimilarity} of the $i^{th}$ ROI in the two hemispheres.

lateralization(efast_fit)

[^synthpop]: Beata Nowok, Gillian M. Raab, Chris Dibben (2016). synthpop: Bespoke Creation of Synthetic Data in R. Journal of Statistical Software, 74(11), 1-26. doi:10.18637/jss.v074.i11 [^camcan]: The Cam-CAN cohort (2015). https://www.cam-can.org/ [^corrplot]: Taiyun Wei and Viliam Simko (2017). R package "corrplot": Visualization of a Correlation Matrix (Version 0.84). Available from https://github.com/taiyun/corrplot [^lavaan]: Yves Rosseel (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1-36. URL http://www.jstatsoft.org/v48/i02/.

vankesteren/efast documentation built on March 5, 2024, 9:41 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com