ResamplingRepeatedSpCVEnv: (blockCV) Repeated "environmental blocking" resampling

Description Details mlr3spatiotempcv notes Super class Active bindings Methods References Examples

Description

Environmental blocking for cross-validation. This function uses clustering methods to specify sets of similar environmental conditions based on the input covariates. Species data corresponding to any of these groups or clusters are assigned to a fold. This function does the clustering in raster space and species data. Clustering is done using kmeans for both approaches. This function works on single or multiple raster files; multiple rasters need to be in a raster brick or stack format.

Details

As k-means algorithms use Euclidean distance to estimate clusters, the input covariates should be quantitative variables. Since variables with wider ranges of values might dominate the clusters and bias the environmental clustering (Hastie et al., 2009), all the input rasters are first standardized within the function. This is done either by normalizing based on subtracting the mean and dividing by the standard deviation of each raster (the default) or optionally by standardizing using linear scaling to constrain all raster values between 0 and 1.

By default, the clustering is done in the raster space. In this approach the clusters will be consistent throughout the region and across species (in the same region). However, this may result in a cluster(s) that covers none of the species records (the spatial location of response samples), espcially when species data is not dispersed throughout the region or the number of clusters (k or folds) is high. In this case, the number of folds is less than specified k. If rasterBlock = FALSE, the clustering will be done in species points and the number of the folds will be the same as k.

Note that the input raster layer should cover all the species points, otherwise an error will rise. The records with no raster value should be deleted prior to the analysis or another raster layer would be provided.

mlr3spatiotempcv notes

The 'Description' and 'Details' fields are inherited from the respective upstream function.

For a list of available arguments, please see blockCV::envBlock.

Super class

mlr3::Resampling -> ResamplingRepeatedSpCVEnv

Active bindings

iters

integer(1)
Returns the number of resampling iterations, depending on the values stored in the param_set.

Methods

Public methods

Inherited methods

Method new()

Create an "Environmental Block" repeated resampling instance.

For a list of available arguments, please see blockCV::envBlock.

Usage
ResamplingRepeatedSpCVEnv$new(id = "repeated_spcv_env")
Arguments
id

character(1)
Identifier for the resampling strategy.


Method folds()

Translates iteration numbers to fold number.

Usage
ResamplingRepeatedSpCVEnv$folds(iters)
Arguments
iters

integer()
Iteration number.


Method repeats()

Translates iteration numbers to repetition number.

Usage
ResamplingRepeatedSpCVEnv$repeats(iters)
Arguments
iters

integer()
Iteration number.


Method instantiate()

Materializes fixed training and test splits for a given task.

Usage
ResamplingRepeatedSpCVEnv$instantiate(task)
Arguments
task

Task
A task to instantiate.


Method clone()

The objects of this class are cloneable with this method.

Usage
ResamplingRepeatedSpCVEnv$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

References

Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G (2018). “blockCV: an R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models.” bioRxiv. doi: 10.1101/357798.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
if (mlr3misc::require_namespaces(c("sf", "blockCV"), quietly = TRUE)) {
  library(mlr3)
  task = tsk("ecuador")

  # Instantiate Resampling
  rrcv = rsmp("repeated_spcv_env", folds = 4, repeats = 2)
  rrcv$instantiate(task)

  # Individual sets:
  rrcv$train_set(1)
  rrcv$test_set(1)
  intersect(rrcv$train_set(1), rrcv$test_set(1))

  # Internal storage:
  rrcv$instance
}

mlr-org/mlr3spatiotempcv documentation built on May 4, 2021, 9:44 a.m.