make_spatial_fold: Create spatially independent training and testing folds
In spatialRF: Easy Spatial Modeling with Random Forest

make_spatial_fold

R Documentation

Create spatially independent training and testing folds

Description

Generates two spatially independent data folds by growing a rectangular buffer from a focal point until a specified fraction of records falls inside. Used internally by make_spatial_folds() and rf_evaluate() for spatial cross-validation.

Usage

make_spatial_fold(
  data = NULL,
  dependent.variable.name = NULL,
  xy.i = NULL,
  xy = NULL,
  distance.step.x = NULL,
  distance.step.y = NULL,
  training.fraction = 0.8
)

Arguments

`data`	Data frame containing response variable and predictors. Required only for binary response variables.
`dependent.variable.name`	Character string with the name of the response variable. Must be a column name in `data`. Required only for binary response variables.
`xy.i`	Single-row data frame with columns "x" (longitude), "y" (latitude), and "id" (record identifier). Defines the focal point from which the buffer grows.
`xy`	Data frame with columns "x" (longitude), "y" (latitude), and "id" (record identifier). Contains all spatial coordinates for the dataset.
`distance.step.x`	Numeric value specifying the buffer growth increment along the x-axis. Default: `NULL` (automatically set to 1/1000th of the x-coordinate range).
`distance.step.y`	Numeric value specifying the buffer growth increment along the y-axis. Default: `NULL` (automatically set to 1/1000th of the y-coordinate range).
`training.fraction`	Numeric value between 0.1 and 0.9 specifying the fraction of records to include in the training fold. Default: `0.8`.

Details

This function creates spatially independent training and testing folds for spatial cross-validation. The algorithm works as follows:

Starts with a small rectangular buffer centered on the focal point (xy.i)
Grows the buffer incrementally by distance.step.x and distance.step.y
Continues growing until the buffer contains the desired number of records (⁠training.fraction * total records⁠)
Assigns records inside the buffer to training and records outside to testing

Special handling for binary response variables:

When data and dependent.variable.name are provided and the response is binary (0/1), the function ensures that training.fraction applies to the number of presences (1s), not total records. This prevents imbalanced sampling in presence-absence models.

Value

List with two elements:

training: Integer vector of record IDs (from xy$id) in the training fold.
testing: Integer vector of record IDs (from xy$id) in the testing fold.

Examples

data(plants_df, plants_xy)

# Create spatial fold centered on first coordinate
fold <- make_spatial_fold(
  xy.i = plants_xy[1, ],
  xy = plants_xy,
  training.fraction = 0.6
)

# View training and testing record IDs
fold$training
fold$testing

# Visualize the spatial split (training = red, testing = blue, center = black)
if (interactive()) {
  plot(plants_xy[c("x", "y")], type = "n", xlab = "", ylab = "")
  points(plants_xy[fold$training, c("x", "y")], col = "red4", pch = 15)
  points(plants_xy[fold$testing, c("x", "y")], col = "blue4", pch = 15)
  points(plants_xy[1, c("x", "y")], col = "black", pch = 15, cex = 2)
}

spatialRF documentation built on Dec. 20, 2025, 1:07 a.m.