| lav_cv | R Documentation |
Estimate out-of-sample predictive performance for structural relations in a fitted 'lavaan' model using repeated holdout (Monte Carlo cross-validation, leave-group-out CV). At each repetition, the model is refitted on a random training subset and evaluated on a disjoint test subset.
lav_cv(
fit,
data = NULL,
times = "auto",
train_prop = 0.8,
seed = 42L,
quiet = TRUE,
digits = 3L,
plot = TRUE,
tol = 0.001,
window = 50L,
max_times = 3000L,
min_r2_for_pct = 0.05
)
## S3 method for class 'lav_cv'
print(x, digits = x$digits %||% 3L, ...)
## S3 method for class 'lav_cv'
summary(object, ...)
fit |
A fitted 'lavaan' object (required). |
data |
The data frame used to fit the model; if NULL, it is extracted from 'fit' when available (default: NULL). |
times |
Integer indicating the number of random splits, or "auto" for stabilization-based early stopping (default: "auto"). |
train_prop |
Numeric in (0,1). Proportion of cases in the training split for each repetition (default: 0.8). |
seed |
Integer. Random seed for reproducibility of the splits (default: 42). |
quiet |
Logical. Suppress 'lavaan' refit messages when TRUE (default: TRUE). |
digits |
Integer. Number of digits to print in summaries (default: 3). |
plot |
Logical. Show convergence plots of the running mean R^2 per outcome (default: TRUE). |
tol |
Numeric. Tolerance for the auto-stop rule on the running mean (default: 0.001). |
window |
Integer. Trailing window size (number of successful splits) used by the auto-stop rule (default: 50). |
max_times |
Integer. Maximum number of splits when |
min_r2_for_pct |
Numeric in (0,1). Minimum in-sample R^2 required to compute percent drop; below this, %_drop is set to NA (default: 0.05). |
x |
A 'lav_cv' object. |
... |
Additional arguments; unused. |
object |
A 'lav_cv' object. |
For observed outcomes, R^2 is computed by comparing test-set observed values with predictions obtained by applying the training-set structural coefficients to the test-set predictors.
For latent outcomes, the outcome is not directly observed in the test set. Factor scores for the outcome are first computed in the test set using the measurement model learned on the training set; these scores serve as the outcome values. Predictions are then formed by applying the training-set structural coefficients to the test-set predictors (including factor scores for any latent predictors). R^2 is computed by comparing the test-set factor scores of the outcome with these predicted scores.
The in-sample baseline R^2 is computed on the full dataset using the same metric as in cross-validation: observed outcomes use observed-versus-predicted R^2; latent outcomes use score-versus-predicted-score R^2.
By default, repetitions continue until the running mean R^2 for each outcome stabilizes within a specified tolerance over a trailing window of successful splits, or until a maximum number of splits is reached.
The summary table reports the in-sample baseline R^2, the median cross-validated R^2, its standard deviation, and the percent drop (baseline vs. median CV) with heuristic threshold markers. The percent drop is suppressed when the in-sample R^2 is very small.
A list with class 'lav_cv' and elements:
tableData frame with columns:
outcome, type ("observed" or "latent"),
r2_in, r2_cv_mean, r2_cv_median, r2_cv_sd,
drop_mean_pct, drop_med_pct, splits_used.
split_matrixMatrix of split-wise test-set R^2 values (rows = splits, columns = outcomes).
timesCharacter or integer indicating the number of splits used (e.g., "auto(534)" or 500).
train_propNumeric. Training proportion used in each split.
NInteger. Number of rows in the input data.
seedInteger. Random seed used to generate the splits.
tolNumeric. Tolerance used by the auto-stop rule.
windowInteger. Trailing window size for the auto-stop rule.
min_r2_for_pctNumeric. Minimum in-sample R^2 required to compute percent drop.
callmatch.call() of the function call.
digitsInteger. Default number of digits for printing.
Cudeck, R., & Browne, M. W. (1983). Cross-Validation Of Covariance Structures. Multivariate Behavioral Research, 18(2), 147-167. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1207/s15327906mbr1802_2")}
Hastie, T., Friedman, J., & Tibshirani, R. (2001). The Elements of Statistical Learning. In Springer Series in Statistics. Springer New York. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-0-387-21606-5")}
Kvalseth, T. O. (1985). Cautionary Note about R2. The American Statistician, 39(4), 279-285. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/00031305.1985.10479448")}
Shmueli, G. (2010). To Explain or to Predict? Statistical Science, 25(3). \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1214/10-sts330")}
Yarkoni, T., & Westfall, J. (2017). Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspectives on Psychological Science, 12(6), 1100-1122. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1177/1745691617693393")}
sem, lavPredict,
inspect
library("lavaan")
model <- "
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
dem60 ~ ind60
dem65 ~ ind60 + dem60
y1 ~~ y5
y2 ~~ y6
"
fit <- lavaan::sem(
model = model,
data = lavaan::PoliticalDemocracy,
std.lv = TRUE,
estimator = "MLR",
meanstructure = TRUE)
result <- lav_cv(
fit = fit,
data = lavaan::PoliticalDemocracy,
times = 5)
print(result)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.