borg: BORG: Guard Your Model Evaluation
In BORG: Bounded Outcome Risk Guard for Model Evaluation

View source: R/borg.R

borg	R Documentation

BORG: Guard Your Model Evaluation

Description

The main entry point for BORG. Diagnoses data dependencies, generates valid cross-validation schemes, and validates evaluation workflows.

Usage

borg(
  data,
  coords = NULL,
  time = NULL,
  groups = NULL,
  target = NULL,
  v = 5,
  train_idx = NULL,
  test_idx = NULL,
  output = c("list", "rsample", "caret", "mlr3"),
  ...
)

Arguments

`data`	A data frame to diagnose and create CV folds for.
`coords`	Character vector of length 2 specifying coordinate column names (e.g., `c("lon", "lat")`). Triggers spatial autocorrelation detection.
`time`	Character string specifying the time column name. Triggers temporal autocorrelation detection.
`groups`	Character string specifying the grouping column name (e.g., "site_id", "patient_id"). Triggers clustered structure detection.
`target`	Character string specifying the response variable column name. Used for more accurate autocorrelation diagnostics.
`v`	Integer. Number of CV folds. Default: 5.
`train_idx`	Integer vector of training indices. If provided along with `test_idx`, validates an existing split instead of generating one.
`test_idx`	Integer vector of test indices. Required if `train_idx` is provided.
`output`	Character. CV output format: "list" (default), "rsample", "caret", "mlr3". Ignored when validating an existing split.
`...`	Additional arguments passed to underlying functions.

Details

borg() operates in two modes:

Diagnosis Mode (Recommended)

When called with structure hints (coords, time, groups) but without train_idx/test_idx, BORG:

Diagnoses data dependencies (spatial, temporal, clustered)
Estimates how much random CV would inflate metrics
Generates appropriate CV folds that respect the dependency structure
Returns everything needed to proceed with valid evaluation

This is the recommended workflow. Let BORG tell you how to split your data.

Validation Mode

When called with train_idx and test_idx, BORG validates the existing split:

Checks for index overlap
Validates group isolation (if groups specified)
Validates temporal ordering (if time specified)
Checks spatial separation (if coords specified)
Detects preprocessing leakage, target leakage, etc.

Use this mode to verify splits you've created yourself.

Value

Depends on usage mode:

Diagnosis mode (no train_idx/test_idx): A list with class "borg_result" containing:

diagnosis: A BorgDiagnosis object with dependency analysis
cv: A borg_cv object with valid cross-validation folds
folds: Shortcut to cv$folds for convenience

Validation mode (with train_idx/test_idx): A BorgRisk object containing the risk assessment of the provided split.

Examples

# ===== DIAGNOSIS MODE (recommended) =====

# Spatial data: let BORG create valid folds
set.seed(42)
spatial_data <- data.frame(
  x = runif(200, 0, 100),
  y = runif(200, 0, 100),
  response = rnorm(200)
)

result <- borg(spatial_data, coords = c("x", "y"), target = "response")
result$diagnosis
result$folds[[1]]  # First fold's train/test indices

# Clustered data
clustered_data <- data.frame(
  site = rep(1:20, each = 10),
  value = rep(rnorm(20), each = 10) + rnorm(200, sd = 0.5)
)

result <- borg(clustered_data, groups = "site", target = "value")
result$diagnosis@recommended_cv  # "group_fold"

# Temporal data
temporal_data <- data.frame(
  date = seq(as.Date("2020-01-01"), by = "day", length.out = 200),
  value = cumsum(rnorm(200))
)

result <- borg(temporal_data, time = "date", target = "value")


# Get rsample-compatible output for tidymodels (requires rsample package)
result <- borg(spatial_data, coords = c("x", "y"), output = "rsample")


# ===== VALIDATION MODE =====

# Validate an existing split
data <- data.frame(x = 1:100, y = rnorm(100))
borg(data, train_idx = 1:70, test_idx = 71:100)

# Validate with group constraint
data$patient <- rep(1:10, each = 10)
borg(data, train_idx = 1:50, test_idx = 51:100, groups = "patient")

BORG documentation built on March 20, 2026, 5:09 p.m.