Bounded Outcome Risk Guard for Model Evaluation
BORG catches data leakage that inflates your model's performance — before you report the wrong number.
library(BORG)
# You scaled the data, then split it. Looks fine?
data_scaled <- scale(iris[, 1:4])
train_idx <- 1:100
test_idx <- 101:150
borg_inspect(data_scaled, train_idx = train_idx, test_idx = test_idx)
#> INVALID — Hard violation: preprocessing_leak
#> "Normalization parameters were computed on data beyond training set"
The test set means leaked into the scaler. Your reported accuracy is wrong. BORG finds this automatically — for scaling, PCA, recipes, caret pipelines, and more.
A model shows 95% accuracy on test data, then drops to 60% in production. The usual cause: data leakage. Information from the test set contaminated training, and the reported metrics were wrong.
A Princeton meta-analysis found leakage errors in 648 published papers across 30 fields. In civil war prediction research, correcting leakage revealed that "complex ML models do not perform substantively better than decades-old Logistic Regression." The reported gains were artifacts.
BORG addresses this problem by automatically detecting six categories of leakage — index overlap, duplicate rows, preprocessing leakage, target leakage, group leakage, and temporal violations — across common R frameworks (base R, caret, tidymodels, mlr3). Beyond detection, BORG diagnoses data dependencies (spatial, temporal, clustered), generates appropriate cross-validation schemes, and produces publication-ready methods paragraphs with test statistics.
These features make the package useful in domains like:
borg(): Main entry point for all validationValidates spatial data (test points too close to training)
borg_inspect(): Detailed inspection of specific objects
caret::preProcess, recipes::recipe, prcomprsample resampling objectsValidates fitted models (lm, glm, ranger, etc.)
borg_diagnose(): Analyze data for dependency structure
borg_compare_cv(): Run random and blocked CV side by side on the same dataplot() for visual comparison
borg_power(): Estimate power loss from switching to blocked CV
summary(): Generate publication-ready methods paragraphsborg_compare_cv() inflation estimates when availableborg_certificate() / borg_export(): Machine-readable validation certificates in YAML/JSON for audit trails| Category | Impact | Response | |----------|--------|----------| | Hard Violation | Results invalid | Blocks evaluation | | Soft Inflation | Results biased | Warns, allows with caution |
Hard Violations:
- index_overlap - Same row in train and test
- duplicate_rows - Identical observations across sets
- preprocessing_leak - Scaler/PCA fitted on full data
- target_leakage - Feature with |r| > 0.99 with target
- group_leakage - Same group in train and test
- temporal_leak - Test data predates training
Soft Inflation:
- proxy_leakage - Feature with |r| 0.95-0.99 with target
- spatial_proximity - Test points close to training
- spatial_overlap - Test inside training convex hull
# Install from GitHub
# install.packages("pak")
pak::pak("gcol33/BORG")
# Or using devtools
# install.packages("devtools")
devtools::install_github("gcol33/BORG")
library(BORG)
# Clean split — passes validation
result <- borg(iris, train_idx = 1:100, test_idx = 101:150)
result
#> Status: VALID
#> Hard violations: 0
#> Soft inflations: 0
# Overlapping indices — caught immediately
borg(iris, train_idx = 1:100, test_idx = 51:150)
#> INVALID — index_overlap: Train and test indices overlap (50 shared indices)
# caret preProcess fitted on ALL data (common mistake)
library(caret)
pp <- preProcess(mtcars, method = c("center", "scale"))
borg_inspect(pp, train_idx = 1:25, test_idx = 26:32, data = mtcars)
#> Hard violation: preprocessing_leak
#> "preProcess centering parameters were computed on data beyond training set"
# Feature highly correlated with outcome
leaky_data <- data.frame(
x = rnorm(100),
outcome = rnorm(100)
)
leaky_data$leaked <- leaky_data$outcome + rnorm(100, sd = 0.01)
borg_inspect(leaky_data, train_idx = 1:70, test_idx = 71:100, target = "outcome")
#> Hard violation: target_leakage_direct
# Clinical data with patient IDs
clinical <- data.frame(
patient_id = rep(1:10, each = 10),
measurement = rnorm(100)
)
# Random split ignoring patients
set.seed(123)
idx <- sample(100)
train_idx <- idx[1:70]
test_idx <- idx[71:100]
borg_inspect(clinical, train_idx, test_idx, groups = "patient_id")
#> Hard violation: group_leakage
spatial_data <- data.frame(
lon = runif(200, -10, 10),
lat = runif(200, -10, 10),
response = rnorm(200)
)
# Let BORG diagnose and generate appropriate CV folds
result <- borg(spatial_data, coords = c("lon", "lat"), target = "response", v = 5)
result$diagnosis@recommended_cv
#> "spatial_block"
# Prove to reviewers that random CV inflates metrics
comparison <- borg_compare_cv(
spatial_data,
formula = response ~ lon + lat,
coords = c("lon", "lat"),
repeats = 10
)
print(comparison)
plot(comparison)
# summary() writes a publication-ready methods paragraph
result <- borg(spatial_data, coords = c("lon", "lat"), target = "response")
summary(result)
#> Model performance was evaluated using spatial block cross-validation
#> (k = 5 folds). Spatial autocorrelation was detected in the data
#> (Moran's I = 0.12, p < 0.001)...
# Three citation styles
summary(result, style = "nature")
summary(result, style = "ecology")
BORG works with common ML frameworks:
# caret
library(caret)
pp <- preProcess(mtcars[, -1], method = c("center", "scale"))
borg_inspect(pp, train_idx = 1:25, test_idx = 26:32, data = mtcars)
# tidymodels
library(recipes)
rec <- recipe(mpg ~ ., data = mtcars) |>
step_normalize(all_numeric_predictors()) |>
prep()
borg_inspect(rec, train_idx = 1:25, test_idx = 26:32, data = mtcars)
| Function | Purpose |
|----------|---------|
| borg() | Main entry point — diagnose data or validate splits |
| borg_inspect() | Detailed inspection of objects |
| borg_diagnose() | Analyze data dependencies |
| borg_validate() | Validate complete workflow |
| borg_assimilate() | Assimilate leaky pipelines into compliance |
| borg_compare_cv() | Empirical random vs blocked CV comparison |
| borg_power() | Power analysis after blocking |
| plot() | Visualize results |
| summary() | Generate methods text for papers |
| borg_certificate() | Create validation certificate |
| borg_export() | Export certificate to YAML/JSON |
"Software is like sex: it's better when it's free." — Linus Torvalds
I'm a PhD student who builds R packages in my free time because I believe good tools should be free and open. I started these projects for my own work and figured others might find them useful too.
If this package saved you some time, buying me a coffee is a nice way to say thanks. It helps with my coffee addiction.
MIT (see the LICENSE.md file)
@software{BORG,
author = {Colling, Gilles},
title = {BORG: Bounded Outcome Risk Guard for Model Evaluation},
year = {2026},
url = {https://github.com/gcol33/BORG}
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.