| borg_compare_cv | R Documentation |
Runs both random and blocked cross-validation on the same data and model, providing empirical evidence of metric inflation from ignoring data dependencies.
borg_compare_cv(
data,
formula,
model_fn = NULL,
predict_fn = NULL,
metric = NULL,
diagnosis = NULL,
coords = NULL,
time = NULL,
groups = NULL,
target = NULL,
v = 5,
repeats = 10,
seed = NULL,
verbose = TRUE
)
data |
A data frame containing predictors and response. |
formula |
A formula specifying the model (e.g., |
model_fn |
A function that fits a model. Should accept |
predict_fn |
A function to generate predictions. Should accept |
metric |
A character string specifying the metric to compute. One of
|
diagnosis |
A |
coords |
Character vector of length 2 specifying coordinate column names. |
time |
Character string specifying the time column name. |
groups |
Character string specifying the grouping column name. |
target |
Character string specifying the response variable name. If NULL, extracted from formula. |
v |
Integer. Number of CV folds. Default: 5. |
repeats |
Integer. Number of times to repeat CV. Default: 10 for stable estimates. |
seed |
Integer. Random seed for reproducibility. |
verbose |
Logical. Print progress messages. Default: TRUE. |
This function provides the "smoking gun" evidence for reviewers. It runs cross-validation twice on the same data:
Random CV: Standard k-fold CV ignoring data structure
Blocked CV: Structure-aware CV based on BORG diagnosis
The difference in metrics demonstrates empirically how much random CV inflates performance estimates when data dependencies exist.
For stable estimates, the comparison is repeated multiple times (default: 10) and a paired t-test assesses whether the difference is statistically significant.
A borg_comparison object (S3 class) containing:
Data frame of metrics from random CV (one row per repeat)
Data frame of metrics from blocked CV (one row per repeat)
Summary statistics comparing the two approaches
Estimated metric inflation from using random CV
The BorgDiagnosis object used
P-value from paired t-test comparing approaches
borg_diagnose for dependency detection,
borg_cv for generating blocked CV folds.
# Spatial data example
set.seed(42)
n <- 200
spatial_data <- data.frame(
x = runif(n, 0, 100),
y = runif(n, 0, 100)
)
# Create spatially autocorrelated response
spatial_data$response <- spatial_data$x * 0.5 + rnorm(n, sd = 5)
# Compare CV approaches
comparison <- borg_compare_cv(
spatial_data,
formula = response ~ x + y,
coords = c("x", "y"),
repeats = 5 # Use more repeats in practice
)
print(comparison)
plot(comparison)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.