set.seed(42) library("mlr3") library("mlr3spatiotempcv") knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(mlr3spatiotempcv)
This package adds resampling methods for the {mlr3} package framework suited for spatial, temporal and spatiotemporal data. These methods can help to reduce the influence of autocorrelation on performance estimates when performing cross-validation. While this article gives a rather technical introduction to the package, a more applied approach can be found in the mlr3book section on "Spatiotemporal Analysis".
After loading the package via library("mlr3spatiotempcv")
, the spatiotemporal resampling methods and example tasks provided by {mlr3spatiotempcv} are available to the user alongside the default {mlr3} resampling methods and tasks.
To make use of spatial resampling methods, a {mlr3} task that is aware of its spatial characteristic needs to be created.
Two Task
child classes exist in {mlr3spatiotempcv} for this purpose:
TaskClassifST
TaskRegrST
To create one of these, you have multiple options:
Task
directly via $new()
- this only works for data.table backends (!)as_task_*
converters (e.g. if your data is stored in an sf
object)We recommend the latter, as the as_task_*
converters aim to make task construction easier, e.g., by creating the DataBackend
(which is required to create a Task in {mlr3}) automatically and setting the crs
and coordinate_names
fields.
Let's assume your (point) data is stored in with an sf
object, which is a common scenario for spatial analysis in R.
# create 'sf' object data_sf = sf::st_as_sf(ecuador, coords = c("x", "y"), crs = 32717) # create `TaskClassifST` from `sf` object task = as_task_classif_st(data_sf, id = "ecuador_task", target = "slides", positive = "TRUE")
You can also use a plain data.frame
.
In this case, crs
and coordinate_names
need to be passed along explicitly as they cannot be inferred directly from the sf
object:
task = as_task_classif_st(ecuador, id = "ecuador_task", target = "slides", positive = "TRUE", coordinate_names = c("x", "y"), crs = 32717)
The *ST
task family prints a subset of the coordinates by default:
print(task)
All *ST
tasks can be treated as their super class equivalents TaskClassif
or TaskRegr
in subsequent {mlr3} modeling steps.
In {mlr3}, dictionaries are used for overview purposes of available methods. The following sections show which dictionaries get appended with new entries when loading {mlr3spatiotempcv}.
TaskClassifST
TaskRegrST
mlr_reflections$task_types
coordinate
space
time
mlr_reflections$task_col_roles
mlr_resampling_spcv_block
mlr_resampling_spcv_buffer
mlr_resampling_spcv_coords
mlr_resampling_spcv_knndm
mlr_resampling_spcv_disc
mlr_resampling_spcv_tiles
mlr_resampling_spcv_env
mlr_resampling_sptcv_cstf
and their respective repeated versions.
See as.data.table(mlr_resamplings)
for the full dictionary.
tsk("ecuador")
(spatial, classif)
tsk("cookfarm_mlr3")
(spatiotemp, regr)
The following table lists all spatiotemporal methods implemented in {mlr3spatiotempcv} (or {mlr3}), their upstream R package and scientific references.
All methods besides "spcv_buffer"
also have a corresponding "repeated" method.
+--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+
| Category | (Package) Method Name | Reference | mlr3 Notation |
+================================+===============================================+===================+==============================================================================+
| Buffering, spatial | (blockCV) Spatial Buffering | @valavi2018 | mlr_resamplings_spcv_buffer
|
+--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+
| Buffering, spatial | (sperrorest) Spatial Disc | @brenning2012 | mlr_resamplings_spcv_disc
|
+--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+
| Blocking, spatial | (blockCV) Spatial Blocking | @valavi2018 | mlr_resamplings_spcv_block
|
+--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+
| Blocking, spatial | (sperrorest) Spatial Tiles | @valavi2018 | mlr_resamplings_spcv_tiles
|
+--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+
| Clustering, spatial | (sperrorest) Spatial CV | @brenning2012 | mlr_resamplings_spcv_coords
|
+--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+
| Clustering, spatial | (CAST) KNNDM | @linnenbrink2023 | mlr_resamplings_spcv_knndm
|
+--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+
| Clustering, feature-space | (blockCV) Environmental Blocking | @valavi2018 | mlr_resamplings_spcv_env
|
+--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+
| ------------------------------ | --------------------------------------------- | ----------------- | ---------------------------------------------------------------------------- |
+--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+
| Grouping, predefined inds | (mlr3) Predefined partitions | | mlr_resamplings_custom_cv
|
+--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+
| Grouping, spatiotemporal | (mlr3) via col_roles
"group"
| | mlr_resamplings_cv
, Task$set_col_roles(<variable>, "group")
|
+--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+
| Grouping, spatiotemporal | (CAST) Leave-Location-and-Time-Out | @meyer2018 | mlr_resamplings_sptcv_cstf
, Task$set_col_roles(<variable>, "space|time")
|
+--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.