View source: R/make_spatial_fold.R
| make_spatial_fold | R Documentation |
Generates two spatially independent data folds by growing a rectangular buffer from a focal point until a specified fraction of records falls inside. Used internally by make_spatial_folds() and rf_evaluate() for spatial cross-validation.
make_spatial_fold(
data = NULL,
dependent.variable.name = NULL,
xy.i = NULL,
xy = NULL,
distance.step.x = NULL,
distance.step.y = NULL,
training.fraction = 0.8
)
data |
Data frame containing response variable and predictors. Required only for binary response variables. |
dependent.variable.name |
Character string with the name of the response variable. Must be a column name in |
xy.i |
Single-row data frame with columns "x" (longitude), "y" (latitude), and "id" (record identifier). Defines the focal point from which the buffer grows. |
xy |
Data frame with columns "x" (longitude), "y" (latitude), and "id" (record identifier). Contains all spatial coordinates for the dataset. |
distance.step.x |
Numeric value specifying the buffer growth increment along the x-axis. Default: |
distance.step.y |
Numeric value specifying the buffer growth increment along the y-axis. Default: |
training.fraction |
Numeric value between 0.1 and 0.9 specifying the fraction of records to include in the training fold. Default: |
This function creates spatially independent training and testing folds for spatial cross-validation. The algorithm works as follows:
Starts with a small rectangular buffer centered on the focal point (xy.i)
Grows the buffer incrementally by distance.step.x and distance.step.y
Continues growing until the buffer contains the desired number of records (training.fraction * total records)
Assigns records inside the buffer to training and records outside to testing
Special handling for binary response variables:
When data and dependent.variable.name are provided and the response is binary (0/1), the function ensures that training.fraction applies to the number of presences (1s), not total records. This prevents imbalanced sampling in presence-absence models.
List with two elements:
training: Integer vector of record IDs (from xy$id) in the training fold.
testing: Integer vector of record IDs (from xy$id) in the testing fold.
make_spatial_folds(), rf_evaluate(), is_binary()
Other preprocessing:
auto_cor(),
auto_vif(),
case_weights(),
default_distance_thresholds(),
double_center_distance_matrix(),
is_binary(),
make_spatial_folds(),
the_feature_engineer(),
weights_from_distance_matrix()
data(plants_df, plants_xy)
# Create spatial fold centered on first coordinate
fold <- make_spatial_fold(
xy.i = plants_xy[1, ],
xy = plants_xy,
training.fraction = 0.6
)
# View training and testing record IDs
fold$training
fold$testing
# Visualize the spatial split (training = red, testing = blue, center = black)
if (interactive()) {
plot(plants_xy[c("x", "y")], type = "n", xlab = "", ylab = "")
points(plants_xy[fold$training, c("x", "y")], col = "red4", pch = 15)
points(plants_xy[fold$testing, c("x", "y")], col = "blue4", pch = 15)
points(plants_xy[1, c("x", "y")], col = "black", pch = 15, cex = 2)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.