borg: BORG: Guard Your Model Evaluation

View source: R/borg.R

borgR Documentation

BORG: Guard Your Model Evaluation

Description

The main entry point for BORG. Diagnoses data dependencies, generates valid cross-validation schemes, and validates evaluation workflows.

Usage

borg(
  data,
  coords = NULL,
  time = NULL,
  groups = NULL,
  target = NULL,
  v = 5,
  train_idx = NULL,
  test_idx = NULL,
  output = c("list", "rsample", "caret", "mlr3"),
  ...
)

Arguments

data

A data frame to diagnose and create CV folds for.

coords

Character vector of length 2 specifying coordinate column names (e.g., c("lon", "lat")). Triggers spatial autocorrelation detection.

time

Character string specifying the time column name. Triggers temporal autocorrelation detection.

groups

Character string specifying the grouping column name (e.g., "site_id", "patient_id"). Triggers clustered structure detection.

target

Character string specifying the response variable column name. Used for more accurate autocorrelation diagnostics.

v

Integer. Number of CV folds. Default: 5.

train_idx

Integer vector of training indices. If provided along with test_idx, validates an existing split instead of generating one.

test_idx

Integer vector of test indices. Required if train_idx is provided.

output

Character. CV output format: "list" (default), "rsample", "caret", "mlr3". Ignored when validating an existing split.

...

Additional arguments passed to underlying functions.

Details

borg() operates in two modes:

Diagnosis Mode (Recommended)

When called with structure hints (coords, time, groups) but without train_idx/test_idx, BORG:

  1. Diagnoses data dependencies (spatial, temporal, clustered)

  2. Estimates how much random CV would inflate metrics

  3. Generates appropriate CV folds that respect the dependency structure

  4. Returns everything needed to proceed with valid evaluation

This is the recommended workflow. Let BORG tell you how to split your data.

Validation Mode

When called with train_idx and test_idx, BORG validates the existing split:

  • Checks for index overlap

  • Validates group isolation (if groups specified)

  • Validates temporal ordering (if time specified)

  • Checks spatial separation (if coords specified)

  • Detects preprocessing leakage, target leakage, etc.

Use this mode to verify splits you've created yourself.

Value

Depends on usage mode:

Diagnosis mode (no train_idx/test_idx): A list with class "borg_result" containing:

diagnosis

A BorgDiagnosis object with dependency analysis

cv

A borg_cv object with valid cross-validation folds

folds

Shortcut to cv$folds for convenience

Validation mode (with train_idx/test_idx): A BorgRisk object containing the risk assessment of the provided split.

See Also

borg_diagnose for diagnosis only, borg_cv for CV generation only, borg_inspect for detailed object inspection.

Examples

# ===== DIAGNOSIS MODE (recommended) =====

# Spatial data: let BORG create valid folds
set.seed(42)
spatial_data <- data.frame(
  x = runif(200, 0, 100),
  y = runif(200, 0, 100),
  response = rnorm(200)
)

result <- borg(spatial_data, coords = c("x", "y"), target = "response")
result$diagnosis
result$folds[[1]]  # First fold's train/test indices

# Clustered data
clustered_data <- data.frame(
  site = rep(1:20, each = 10),
  value = rep(rnorm(20), each = 10) + rnorm(200, sd = 0.5)
)

result <- borg(clustered_data, groups = "site", target = "value")
result$diagnosis@recommended_cv  # "group_fold"

# Temporal data
temporal_data <- data.frame(
  date = seq(as.Date("2020-01-01"), by = "day", length.out = 200),
  value = cumsum(rnorm(200))
)

result <- borg(temporal_data, time = "date", target = "value")


# Get rsample-compatible output for tidymodels (requires rsample package)
result <- borg(spatial_data, coords = c("x", "y"), output = "rsample")


# ===== VALIDATION MODE =====

# Validate an existing split
data <- data.frame(x = 1:100, y = rnorm(100))
borg(data, train_idx = 1:70, test_idx = 71:100)

# Validate with group constraint
data$patient <- rep(1:10, each = 10)
borg(data, train_idx = 1:50, test_idx = 51:100, groups = "patient")


BORG documentation built on March 20, 2026, 5:09 p.m.