The package blockCV
offers a range of functions for generating train
and test folds for k-fold and leave-one-out (LOO)
cross-validation (CV). It allows for separation of data spatially and
environmentally, with various options for block construction.
Additionally, it includes a function for assessing the level of spatial
autocorrelation in response or raster covariates, to aid in selecting an
appropriate distance band for data separation. The blockCV
package is
suitable for the evaluation of a variety of spatial modelling
applications, including classification of remote sensing imagery, soil
mapping, and species distribution modelling (SDM). It also provides
support for different SDM scenarios, including presence-absence and
presence-background species data, rare and common species, and raster
data for predictor variables.
The latest version blockCV
(v3.0) features significant updates and changes. All function names have been revised to more general names, beginning with cv_*
. Although the previous functions (version 2.x) will continue to work, they will be removed in future updates after being available for an extended period. It is highly recommended to update your code with the new functions provided below.
Some new updates:
cv_
cv_spatial
, cv_cluster
,
cv_buffer
, and cv_nndm
cv_cluster
function generates blocks based on kmeans
clustering. It now works on both environmental rasters and the
spatial coordinates of sample pointscv_spatial_autocor
function now calculates the spatial
autocorrelation range for both the response (i.e. binary or
continuous data) and a set of continuous raster covariatescv_plot
function allows for visualization of folds from
all blocking strategies using ggplot facetsterra
package is now used for all raster processing and
supports both stars
and raster
objects, as well as files on
disk.cv_similarity
provides measures on possible extrapolation
to testing foldsTo install the latest update of the package from GitHub use:
remotes::install_github("rvalavi/blockCV", dependencies = TRUE)
Or installing from CRAN:
install.packages("blockCV", dependencies = TRUE)
To see the practical examples of the package see:
caret
and tidymodels
(coming soon!)This code snippet showcases some of the package's functionalities, but for more comprehensive tutorials, please refer to the vignette included with the package (and above).
# loading the package
library(blockCV)
library(sf) # working with spatial vector data
library(terra) # working with spatial raster data
# load raster data; the pipe operator |> is available for R v4.1 or higher
myrasters <- system.file("extdata/au/", package = "blockCV") |>
list.files(full.names = TRUE) |>
terra::rast()
# load species presence-absence data and convert to sf
pa_data <- read.csv(system.file("extdata/", "species.csv", package = "blockCV")) |>
sf::st_as_sf(coords = c("x", "y"), crs = 7845)
# spatial blocking by specified range and random assignment
sb <- cv_spatial(x = pa_data, # sf or SpatialPoints of sample data (e.g. species data)
column = "occ", # the response column (binary or multi-class)
r = myrasters, # a raster for background (optional)
size = 450000, # size of the blocks in metres
k = 5, # number of folds
hexagon = TRUE, # use hexagonal blocks - defualt
selection = "random", # random blocks-to-fold
iteration = 100, # to find evenly dispersed folds
biomod2 = TRUE) # also create folds for biomod2
Or create spatial clusters for k-fold cross-validation:
# create spatial clusters
set.seed(6)
sc <- cv_cluster(x = pa_data,
column = "occ", # optionally count data in folds (binary or multi-class)
k = 5)
# now plot the created folds
cv_plot(cv = sc, # a blockCV object
x = pa_data, # sample points
r = myrasters[[1]], # optionally add a raster background
points_alpha = 0.5,
nrow = 2)
Investigate spatial autocorrelation in the landscape to choose a suitable size for spatial blocks:
# exploring the effective range of spatial autocorrelation in raster covariates or sample data
cv_spatial_autocor(r = myrasters, # a SpatRaster object or path to files
num_sample = 5000, # number of cells to be used
plot = TRUE)
Alternatively, you can manually choose the size of spatial blocks in an interactive session using a Shiny app.
# shiny app to aid selecting a size for spatial blocks
cv_block_size(r = myrasters[[1]],
x = pa_data, # optionally add sample points
column = "occ",
min_size = 2e5,
max_size = 9e5)
Please report issues at: https://github.com/rvalavi/blockCV/issues
To cite package blockCV in publications, please use:
Valavi R, Elith J, Lahoz-Monfort JJ, Guillera-Arroita G. blockCV: An R package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models. Methods Ecol Evol. 2019; 10:225--232. https://doi.org/10.1111/2041-210X.13107
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.