View source: R/make_spatial_folds.R
| make_spatial_folds | R Documentation |
Applies make_spatial_fold() to every row in xy.selected, generating one spatially independent fold centered on each focal point. Used for spatial cross-validation in rf_evaluate().
make_spatial_folds(
data = NULL,
dependent.variable.name = NULL,
xy.selected = NULL,
xy = NULL,
distance.step.x = NULL,
distance.step.y = NULL,
training.fraction = 0.75,
n.cores = parallel::detectCores() - 1,
cluster = NULL
)
data |
Data frame containing response variable and predictors. Required only for binary response variables. |
dependent.variable.name |
Character string with the name of the response variable. Must be a column name in |
xy.selected |
Data frame with columns "x" (longitude), "y" (latitude), and "id" (record identifier). Defines the focal points for fold creation. Typically a spatially thinned subset of |
xy |
Data frame with columns "x" (longitude), "y" (latitude), and "id" (record identifier). Contains all spatial coordinates for the dataset. |
distance.step.x |
Numeric value specifying the buffer growth increment along the x-axis. Default: |
distance.step.y |
Numeric value specifying the buffer growth increment along the y-axis. Default: |
training.fraction |
Numeric value between 0.1 and 0.9 specifying the fraction of records to include in the training fold. Default: |
n.cores |
Integer specifying the number of CPU cores for parallel execution. Default: |
cluster |
Optional cluster object created with |
This function creates multiple spatially independent folds for spatial cross-validation by calling make_spatial_fold() once for each row in xy.selected. Each fold is created by growing a rectangular buffer from the corresponding focal point until the desired training.fraction is achieved.
Parallel execution:
The function uses parallel processing to speed up fold creation. You can control parallelization with n.cores or provide a pre-configured cluster object.
Typical workflow:
Thin spatial points with thinning() or thinning_til_n() to create xy.selected
Create spatial folds with this function
Use the folds for spatial cross-validation in rf_evaluate()
List where each element corresponds to a row in xy.selected and contains:
training: Integer vector of record IDs (from xy$id) in the training fold.
testing: Integer vector of record IDs (from xy$id) in the testing fold.
make_spatial_fold(), rf_evaluate(), thinning(), thinning_til_n()
Other preprocessing:
auto_cor(),
auto_vif(),
case_weights(),
default_distance_thresholds(),
double_center_distance_matrix(),
is_binary(),
make_spatial_fold(),
the_feature_engineer(),
weights_from_distance_matrix()
data(plants_df, plants_xy)
# Thin to 10 focal points to speed up example
xy.thin <- thinning_til_n(
xy = plants_xy,
n = 10
)
# Create spatial folds centered on the 10 thinned points
folds <- make_spatial_folds(
xy.selected = xy.thin,
xy = plants_xy,
distance.step.x = 0.05,
training.fraction = 0.6,
n.cores = 1
)
# Each element is a fold with training and testing indices
length(folds) # 10 folds
names(folds[[1]]) # "training" and "testing"
# Visualize first fold (training = red, testing = blue, center = black)
if (interactive()) {
plot(plants_xy[c("x", "y")], type = "n", xlab = "", ylab = "")
points(plants_xy[folds[[1]]$training, c("x", "y")], col = "red4", pch = 15)
points(plants_xy[folds[[1]]$testing, c("x", "y")], col = "blue4", pch = 15)
points(
plants_xy[folds[[1]]$training[1], c("x", "y")],
col = "black",
pch = 15,
cex = 2
)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.