set.seed(42)
library("mlr3")
library("mlr3spatiotempcv")

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

library(mlr3spatiotempcv)

This package adds resampling methods for the {mlr3} package framework suited for spatial, temporal and spatiotemporal data. These methods can help to reduce the influence of autocorrelation on performance estimates when performing cross-validation. While this article gives a rather technical introduction to the package, a more applied approach can be found in the mlr3book section on "Spatiotemporal Analysis".

After loading the package via library("mlr3spatiotempcv"), the spatiotemporal resampling methods and example tasks provided by {mlr3spatiotempcv} are available to the user alongside the default {mlr3} resampling methods and tasks.

Creating a spatial Task

To make use of spatial resampling methods, a {mlr3} task that is aware of its spatial characteristic needs to be created. Two Task child classes exist in {mlr3spatiotempcv} for this purpose:

To create one of these, you have multiple options:

  1. Use the constructor of the Task directly via $new() - this only works for data.table backends (!)
  2. Use the as_task_* converters (e.g. if your data is stored in an sf object)

We recommend the latter, as the as_task_* converters aim to make task construction easier, e.g., by creating the DataBackend (which is required to create a Task in {mlr3}) automatically and setting the crs and coordinate_names fields. Let's assume your (point) data is stored in with an sf object, which is a common scenario for spatial analysis in R.

# create 'sf' object
data_sf = sf::st_as_sf(ecuador, coords = c("x", "y"), crs = 32717)

# create `TaskClassifST` from `sf` object
task = as_task_classif_st(data_sf, id = "ecuador_task", target = "slides", positive = "TRUE")

You can also use a plain data.frame. In this case, crs and coordinate_names need to be passed along explicitly as they cannot be inferred directly from the sf object:

task = as_task_classif_st(ecuador, id = "ecuador_task", target = "slides",
  positive = "TRUE", coordinate_names = c("x", "y"), crs = 32717)

The *ST task family prints a subset of the coordinates by default:

print(task)

All *ST tasks can be treated as their super class equivalents TaskClassif or TaskRegr in subsequent {mlr3} modeling steps.

Contributed reflections by {mlr3spatiotempcv}

In {mlr3}, dictionaries are used for overview purposes of available methods. The following sections show which dictionaries get appended with new entries when loading {mlr3spatiotempcv}.

Task Type

mlr_reflections$task_types

Task Column Roles

mlr_reflections$task_col_roles

Resampling Methods

and their respective repeated versions. See as.data.table(mlr_resamplings) for the full dictionary.

Examples Tasks

Upstream Packages and Scientific References

The following table lists all spatiotemporal methods implemented in {mlr3spatiotempcv} (or {mlr3}), their upstream R package and scientific references. All methods besides "spcv_buffer" also have a corresponding "repeated" method.

+--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | Category | (Package) Method Name | Reference | mlr3 Notation | +================================+===============================================+===================+==============================================================================+ | Buffering, spatial | (blockCV) Spatial Buffering | @valavi2018 | mlr_resamplings_spcv_buffer | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | Buffering, spatial | (sperrorest) Spatial Disc | @brenning2012 | mlr_resamplings_spcv_disc | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | Blocking, spatial | (blockCV) Spatial Blocking | @valavi2018 | mlr_resamplings_spcv_block | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | Blocking, spatial | (sperrorest) Spatial Tiles | @valavi2018 | mlr_resamplings_spcv_tiles | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | Clustering, spatial | (sperrorest) Spatial CV | @brenning2012 | mlr_resamplings_spcv_coords | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | Clustering, spatial | (CAST) KNNDM | @linnenbrink2023 | mlr_resamplings_spcv_knndm | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | Clustering, feature-space | (blockCV) Environmental Blocking | @valavi2018 | mlr_resamplings_spcv_env | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | ------------------------------ | --------------------------------------------- | ----------------- | ---------------------------------------------------------------------------- | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | Grouping, predefined inds | (mlr3) Predefined partitions | | mlr_resamplings_custom_cv | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | Grouping, spatiotemporal | (mlr3) via col_roles "group" | | mlr_resamplings_cv, Task$set_col_roles(<variable>, "group") | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+ | Grouping, spatiotemporal | (CAST) Leave-Location-and-Time-Out | @meyer2018 | mlr_resamplings_sptcv_cstf, Task$set_col_roles(<variable>, "space|time") | +--------------------------------+-----------------------------------------------+-------------------+------------------------------------------------------------------------------+

References



mlr-org/mlr3spatiotempcv documentation built on April 23, 2024, 6:50 a.m.