| borg | R Documentation |
The main entry point for BORG. Diagnoses data dependencies, generates valid cross-validation schemes, and validates evaluation workflows.
borg(
data,
coords = NULL,
time = NULL,
groups = NULL,
target = NULL,
v = 5,
train_idx = NULL,
test_idx = NULL,
output = c("list", "rsample", "caret", "mlr3"),
...
)
data |
A data frame to diagnose and create CV folds for. |
coords |
Character vector of length 2 specifying coordinate column names
(e.g., |
time |
Character string specifying the time column name. Triggers temporal autocorrelation detection. |
groups |
Character string specifying the grouping column name (e.g., "site_id", "patient_id"). Triggers clustered structure detection. |
target |
Character string specifying the response variable column name. Used for more accurate autocorrelation diagnostics. |
v |
Integer. Number of CV folds. Default: 5. |
train_idx |
Integer vector of training indices. If provided along with
|
test_idx |
Integer vector of test indices. Required if |
output |
Character. CV output format: "list" (default), "rsample", "caret", "mlr3". Ignored when validating an existing split. |
... |
Additional arguments passed to underlying functions. |
borg() operates in two modes:
When called with structure hints (coords, time, groups)
but without train_idx/test_idx, BORG:
Diagnoses data dependencies (spatial, temporal, clustered)
Estimates how much random CV would inflate metrics
Generates appropriate CV folds that respect the dependency structure
Returns everything needed to proceed with valid evaluation
This is the recommended workflow. Let BORG tell you how to split your data.
When called with train_idx and test_idx, BORG validates the
existing split:
Checks for index overlap
Validates group isolation (if groups specified)
Validates temporal ordering (if time specified)
Checks spatial separation (if coords specified)
Detects preprocessing leakage, target leakage, etc.
Use this mode to verify splits you've created yourself.
Depends on usage mode:
Diagnosis mode (no train_idx/test_idx): A list with class "borg_result" containing:
A BorgDiagnosis object with dependency analysis
A borg_cv object with valid cross-validation folds
Shortcut to cv$folds for convenience
Validation mode (with train_idx/test_idx): A BorgRisk
object containing the risk assessment of the provided split.
borg_diagnose for diagnosis only,
borg_cv for CV generation only,
borg_inspect for detailed object inspection.
# ===== DIAGNOSIS MODE (recommended) =====
# Spatial data: let BORG create valid folds
set.seed(42)
spatial_data <- data.frame(
x = runif(200, 0, 100),
y = runif(200, 0, 100),
response = rnorm(200)
)
result <- borg(spatial_data, coords = c("x", "y"), target = "response")
result$diagnosis
result$folds[[1]] # First fold's train/test indices
# Clustered data
clustered_data <- data.frame(
site = rep(1:20, each = 10),
value = rep(rnorm(20), each = 10) + rnorm(200, sd = 0.5)
)
result <- borg(clustered_data, groups = "site", target = "value")
result$diagnosis@recommended_cv # "group_fold"
# Temporal data
temporal_data <- data.frame(
date = seq(as.Date("2020-01-01"), by = "day", length.out = 200),
value = cumsum(rnorm(200))
)
result <- borg(temporal_data, time = "date", target = "value")
# Get rsample-compatible output for tidymodels (requires rsample package)
result <- borg(spatial_data, coords = c("x", "y"), output = "rsample")
# ===== VALIDATION MODE =====
# Validate an existing split
data <- data.frame(x = 1:100, y = rnorm(100))
borg(data, train_idx = 1:70, test_idx = 71:100)
# Validate with group constraint
data$patient <- rep(1:10, each = 10)
borg(data, train_idx = 1:50, test_idx = 51:100, groups = "patient")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.