make_spatial_fold: Create spatially independent training and testing folds

View source: R/make_spatial_fold.R

make_spatial_foldR Documentation

Create spatially independent training and testing folds

Description

Generates two spatially independent data folds by growing a rectangular buffer from a focal point until a specified fraction of records falls inside. Used internally by make_spatial_folds() and rf_evaluate() for spatial cross-validation.

Usage

make_spatial_fold(
  data = NULL,
  dependent.variable.name = NULL,
  xy.i = NULL,
  xy = NULL,
  distance.step.x = NULL,
  distance.step.y = NULL,
  training.fraction = 0.8
)

Arguments

data

Data frame containing response variable and predictors. Required only for binary response variables.

dependent.variable.name

Character string with the name of the response variable. Must be a column name in data. Required only for binary response variables.

xy.i

Single-row data frame with columns "x" (longitude), "y" (latitude), and "id" (record identifier). Defines the focal point from which the buffer grows.

xy

Data frame with columns "x" (longitude), "y" (latitude), and "id" (record identifier). Contains all spatial coordinates for the dataset.

distance.step.x

Numeric value specifying the buffer growth increment along the x-axis. Default: NULL (automatically set to 1/1000th of the x-coordinate range).

distance.step.y

Numeric value specifying the buffer growth increment along the y-axis. Default: NULL (automatically set to 1/1000th of the y-coordinate range).

training.fraction

Numeric value between 0.1 and 0.9 specifying the fraction of records to include in the training fold. Default: 0.8.

Details

This function creates spatially independent training and testing folds for spatial cross-validation. The algorithm works as follows:

  1. Starts with a small rectangular buffer centered on the focal point (xy.i)

  2. Grows the buffer incrementally by distance.step.x and distance.step.y

  3. Continues growing until the buffer contains the desired number of records (⁠training.fraction * total records⁠)

  4. Assigns records inside the buffer to training and records outside to testing

Special handling for binary response variables:

When data and dependent.variable.name are provided and the response is binary (0/1), the function ensures that training.fraction applies to the number of presences (1s), not total records. This prevents imbalanced sampling in presence-absence models.

Value

List with two elements:

  • training: Integer vector of record IDs (from xy$id) in the training fold.

  • testing: Integer vector of record IDs (from xy$id) in the testing fold.

See Also

make_spatial_folds(), rf_evaluate(), is_binary()

Other preprocessing: auto_cor(), auto_vif(), case_weights(), default_distance_thresholds(), double_center_distance_matrix(), is_binary(), make_spatial_folds(), the_feature_engineer(), weights_from_distance_matrix()

Examples

data(plants_df, plants_xy)

# Create spatial fold centered on first coordinate
fold <- make_spatial_fold(
  xy.i = plants_xy[1, ],
  xy = plants_xy,
  training.fraction = 0.6
)

# View training and testing record IDs
fold$training
fold$testing

# Visualize the spatial split (training = red, testing = blue, center = black)
if (interactive()) {
  plot(plants_xy[c("x", "y")], type = "n", xlab = "", ylab = "")
  points(plants_xy[fold$training, c("x", "y")], col = "red4", pch = 15)
  points(plants_xy[fold$testing, c("x", "y")], col = "blue4", pch = 15)
  points(plants_xy[1, c("x", "y")], col = "black", pch = 15, cex = 2)
}


spatialRF documentation built on Dec. 20, 2025, 1:07 a.m.