knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The cvsem package provides cross-validation (CV) of structural equation models (SEM) across a user-defined number of folds. CV is based on computing the discrepancy among the held-out test sample covariance and the model implied covariance from the training samples. This approach of cross-validating SEM's is described in @Cudeck1983 and @BrowneCudeck1992. The individual models are fitted via the lavaan package [@Rosseel2012lavaan] to obtain the model implied covariance matrix. The discrepancy of the implied matrix to the test sample covariance matrix is obtained via a pre-specified metric (defaults to Kullback-Leibler divergence aka. Maximum Likelihood discrepancy). The cvsem
function returns the average discrepancy together with a corresponding standard error for each tested model.
Currently, the provided model code needs to follow one of lavaan's allowed specifications.
cvsem is available on CRAN and can be installed with
install.packages('cvsem')
You can install the development version of cvsem from GitHub with:
# install.packages("devtools") devtools::install_github("AnnaWysocki/cvsem")
Cross-validating the Holzingerswineford1939 dataset
Load package and read in data from the lavaan package:
library(cvsem) example_data <- lavaan::HolzingerSwineford1939
Add column names
colnames(example_data) <- c("id", "sex", "ageyr", "agemo", "school", "grade", "visualPerception", "cubes", "lozenges", "comprehension", "sentenceCompletion", "wordMeaning", "speededAddition", "speededCounting", "speededDiscrimination")
Define some models to be compared with cvsem
using lavaan
notation:
model1 <- 'comprehension ~ sentenceCompletion + wordMeaning' model2 <- 'comprehension ~ meaning ## Add some latent variables: meaning =~ wordMeaning + sentenceCompletion speed =~ speededAddition + speededDiscrimination + speededCounting speed ~~ meaning' model3 <- 'comprehension ~ wordMeaning + speededAddition'
Gather models into a named list object with cvgather
. These could also be fitted lavaan
objects based on the same data.
models <- cvgather(model1, model2, model3)
Define number of folds k
and call cvsem
function.
Here we use k=10
folds. CV is based on the discrepancy between test sample covariance matrix and the model implied matrix from the training data. The discrepancy among sample and implied matrix is defined in discrepancyMetric
.
Currently three discrepancy metrics are available: KL-Divergence
, Generalized Least Squares GLS
, and Frobenius Distance FD
.
Here we use KL-Divergence
.
fit <- cvsem( data = example_data, Models = models, k = 10, discrepancyMetric = "KL-Divergence")
Print fitted cvsem
-object. Note, the model with the smallest (best) discrepancy is listed first. The metric reflects the average of the discrepancy metric across all folds (aka. expected cross-validation index (ECVI)) together with the associated standard error.
fit
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.