Quick Start"
In BORG: Bounded Outcome Risk Guard for Model Evaluation

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
library(BORG)

Why Your Test Accuracy Might Be Wrong

A model shows 95% accuracy on test data, then drops to 60% in production. The usual culprit: data leakage.

Leakage happens when information from your test set contaminates training. Common causes:

Preprocessing (scaling, PCA) fitted on all data before splitting
Features derived from the outcome variable
Same patient/site appearing in both train and test
Random CV on spatially autocorrelated data

BORG checks for these problems before you compute metrics.

Basic Usage

# Create sample data
set.seed(42)
data <- data.frame(
  x1 = rnorm(100),
  x2 = rnorm(100),
  y = rnorm(100)
)

# Define a split
train_idx <- 1:70
test_idx <- 71:100

# Inspect the split
result <- borg_inspect(data, train_idx = train_idx, test_idx = test_idx)
result

No violations detected. But what if we made a mistake?

# Accidental overlap in indices
bad_result <- borg_inspect(data, train_idx = 1:60, test_idx = 51:100)
bad_result

BORG caught the overlap immediately.

The Main Entry Point: `borg()`

For most workflows, borg() is all you need. It handles two modes:

Mode 1: Diagnose Data Dependencies

When you have structured data (spatial coordinates, time column, or groups), BORG diagnoses dependencies and generates appropriate CV folds:

# Spatial data with coordinates
set.seed(42)
spatial_data <- data.frame(
  lon = runif(200, -10, 10),
  lat = runif(200, -10, 10),
  elevation = rnorm(200, 500, 100),
  response = rnorm(200)
)

# Let BORG diagnose and create CV folds
result <- borg(spatial_data, coords = c("lon", "lat"), target = "response")
result

BORG detected spatial structure and recommended spatial block CV instead of random CV.

Mode 2: Validate Existing Splits

When you have your own train/test indices, BORG validates them:

# Validate a manual split
risk <- borg(spatial_data, train_idx = 1:150, test_idx = 151:200)
risk

Visualizing Results

Use standard R plot() and summary():

# Plot the risk assessment
plot(risk)

# Generate methods text for publications
summary(result)

Data Dependency Types

BORG handles three types of data dependencies:

Spatial Autocorrelation

Points close together tend to have similar values. Random CV underestimates error because train and test points are intermixed.

result_spatial <- borg(spatial_data, coords = c("lon", "lat"), target = "response")
result_spatial$diagnosis@recommended_cv

Temporal Autocorrelation

Sequential observations are correlated. Future data must not leak into past predictions.

temporal_data <- data.frame(
  date = seq(as.Date("2020-01-01"), by = "day", length.out = 200),
  value = cumsum(rnorm(200))
)

result_temporal <- borg(temporal_data, time = "date", target = "value")
result_temporal$diagnosis@recommended_cv

Clustered/Grouped Data

Observations within groups (patients, sites, species) are more similar than between groups.

grouped_data <- data.frame(
  site = rep(1:20, each = 10),
  measurement = rnorm(200)
)

result_grouped <- borg(grouped_data, groups = "site", target = "measurement")
result_grouped$diagnosis@recommended_cv

Risk Categories

BORG classifies risks into two categories:

Hard Violations (Evaluation Invalid)

These invalidate your results completely:

| Risk | Description | |------|-------------| | index_overlap | Same row in both train and test | | duplicate_rows | Identical observations in train and test | | target_leakage | Feature with |r| > 0.99 with target | | group_leakage | Same group in train and test | | temporal_leakage | Test data predates training data | | preprocessing_leakage | Scaler/PCA fitted on full data |

Soft Inflation (Results Biased)

These inflate metrics but don't completely invalidate:

| Risk | Description | |------|-------------| | proxy_leakage | Feature with |r| 0.95-0.99 with target | | spatial_proximity | Test points too close to train | | random_cv_inflation | Random CV on dependent data |

Detecting Specific Leakage Types

Target Leakage

Features derived from the outcome:

# Simulate target leakage
leaky_data <- data.frame(
  x = rnorm(100),
  leaked_feature = rnorm(100),  # Will be made leaky
  outcome = rnorm(100)
)
# Make leaked_feature highly correlated with outcome
leaky_data$leaked_feature <- leaky_data$outcome + rnorm(100, sd = 0.05)

result <- borg_inspect(leaky_data, train_idx = 1:70, test_idx = 71:100,
                       target = "outcome")
result

Group Leakage

Same entity in train and test:

# Simulate clinical data with patient IDs
clinical_data <- data.frame(
  patient_id = rep(1:10, each = 10),
  visit = rep(1:10, times = 10),
  measurement = rnorm(100)
)

# Random split ignoring patients (BAD)
set.seed(123)
all_idx <- sample(100)
train_idx <- all_idx[1:70]
test_idx <- all_idx[71:100]

# Check for group leakage
result <- borg_inspect(clinical_data, train_idx = train_idx, test_idx = test_idx,
                       groups = "patient_id")
result

Working with CV Folds

Access the generated folds directly:

result <- borg(spatial_data, coords = c("lon", "lat"), target = "response", v = 5)

# Number of folds
length(result$folds)

# First fold's train/test sizes
cat("Fold 1 - Train:", length(result$folds[[1]]$train),
    "Test:", length(result$folds[[1]]$test), "\n")

Exporting Results

For reproducibility, export validation certificates:

# Create a certificate
cert <- borg_certificate(result$diagnosis, data = spatial_data)
cert

# Export to file
borg_export(result$diagnosis, spatial_data, "validation.yaml")
borg_export(result$diagnosis, spatial_data, "validation.json")

Writing Methods Sections

summary() generates publication-ready methods paragraphs that include the statistical tests BORG ran, the dependency type detected, and the CV strategy chosen. Three citation styles are supported:

# Default APA style
result <- borg(spatial_data, coords = c("lon", "lat"), target = "response")
methods_text <- summary(result)

# Nature style
summary(result, style = "nature")

# Ecology style
summary(result, style = "ecology")

The returned text is a character string you can paste directly into a manuscript. If you also ran borg_compare_cv(), pass the comparison object to include empirical inflation estimates:

comparison <- borg_compare_cv(spatial_data, response ~ lon + lat,
                              coords = c("lon", "lat"))
summary(result, comparison = comparison)

Empirical CV Comparison

When reviewers ask "does it really matter?", borg_compare_cv() runs both random and blocked CV on the same data and model, then tests whether the difference is statistically significant:

comparison <- borg_compare_cv(
  spatial_data,
  formula = response ~ lon + lat,
  coords = c("lon", "lat"),
  v = 5,
  repeats = 5  # Use more repeats in practice
)
print(comparison)

plot(comparison)

Power Analysis After Blocking

Switching from random to blocked CV reduces effective sample size. Before committing to blocked CV, check whether your dataset is large enough:

# Clustered data: 20 sites, 10 observations each
clustered_data <- data.frame(
  site = rep(1:20, each = 10),
  value = rep(rnorm(20, sd = 2), each = 10) + rnorm(200, sd = 0.5)
)

pw <- borg_power(clustered_data, groups = "site", target = "value")
print(pw)
summary(pw)

Interface Summary

| Function | Purpose | |----------|---------| | borg() | Main entry point — diagnose data or validate splits | | borg_inspect() | Detailed inspection of train/test split | | borg_diagnose() | Analyze data dependencies only | | borg_compare_cv() | Empirical random vs blocked CV comparison | | borg_power() | Power analysis after blocking | | plot() | Visualize results | | summary() | Generate methods text for papers | | borg_certificate() | Create validation certificate | | borg_export() | Export certificate to YAML/JSON |

BORG
Bounded Outcome Risk Guard for Model Evaluation

Quick Start"
In BORG: Bounded Outcome Risk Guard for Model Evaluation

Why Your Test Accuracy Might Be Wrong

Basic Usage

The Main Entry Point: `borg()`

Mode 1: Diagnose Data Dependencies

Mode 2: Validate Existing Splits

Visualizing Results

Data Dependency Types

Spatial Autocorrelation

Temporal Autocorrelation

Clustered/Grouped Data

Risk Categories

Hard Violations (Evaluation Invalid)

Soft Inflation (Results Biased)

Detecting Specific Leakage Types

Target Leakage

Group Leakage

Working with CV Folds

Exporting Results

Writing Methods Sections

Empirical CV Comparison

Power Analysis After Blocking

Interface Summary

See Also

Try the BORG package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

BORG Bounded Outcome Risk Guard for Model Evaluation

Quick Start" In BORG: Bounded Outcome Risk Guard for Model Evaluation

Why Your Test Accuracy Might Be Wrong

Basic Usage

The Main Entry Point: borg()

Mode 1: Diagnose Data Dependencies

Mode 2: Validate Existing Splits

Visualizing Results

Data Dependency Types

Spatial Autocorrelation

Temporal Autocorrelation

Clustered/Grouped Data

Risk Categories

Hard Violations (Evaluation Invalid)

Soft Inflation (Results Biased)

Detecting Specific Leakage Types

Target Leakage

Group Leakage

Working with CV Folds

Exporting Results

Writing Methods Sections

Empirical CV Comparison

Power Analysis After Blocking

Interface Summary

See Also

Try the BORG package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

BORG
Bounded Outcome Risk Guard for Model Evaluation

Quick Start"
In BORG: Bounded Outcome Risk Guard for Model Evaluation

The Main Entry Point: `borg()`