spatialBlock: Use spatial blocks to separate train and test folds

Description Usage Arguments Details Value References See Also Examples

Description

This function creates spatially separated folds based on a pre-specified distance. It assigns blocks to the training and testing folds randomly, systematically or in a checkerboard pattern. The distance (theRange) should be in metres, regardless of the unit of the reference system of the input data (for more information see the details section). By default, the function creates blocks according to the extent and shape of the study area, assuming that the user has considered the landscape for the given species and case study. Alternatively, blocks can be created based on species spatial data. This is especially useful when the species data is not evenly dispersed in the whole region. Blocks can also be offset so the origin is not at the outer corner of the rasters. Instead of providing a distance, the blocks can also be created by specifying a number of rows and columns and divide the study area into vertical or horizontal bins, as presented in Wenger & Olden (2012) and Bahn & McGill (2012). Finally, the blocks can be specified by a user-defined spatial polygon layer.

Usage

1
2
3
4
5
6
spatialBlock(speciesData, species = NULL, blocks = NULL,
  rasterLayer = NULL, theRange = NULL, rows = NULL, cols = NULL,
  k = 5, selection = "random", iteration = 250, numLimit = NULL,
  maskBySpecies = TRUE, degMetre = 111325, border = NULL,
  showBlocks = TRUE, biomod2Format = TRUE, xOffset = 0,
  yOffset = 0, progress = TRUE)

Arguments

speciesData

A SpatialPointsDataFrame, SpatialPoints or sf object containing species data.

species

Character. Indicating the name of the field in which species presence/absence data (0s and 1s) are stored. If speceis = NULL the presence and absence data will be treated the same and only training and testing records will be counted.

blocks

A SpatialPolygons* or sf object to be used as the blocks. This can be a user defined polygon and it must cover all the species points.

rasterLayer

RasterLayer for visualisation. If provided, this will be used to specify the blocks covering the area.

theRange

Numeric value of the specified range by which blocks are created and training/testing data are separated. This distance should be in metres. The range could be explored by spatialAutoRange() and rangeExplorer() functions.

rows

Integer value by which the area is divided into latitudinal bins.

cols

Integer value by which the area is divided into longitudinal bins.

k

Integer value. The number of desired folds for cross-validation. The default is k = 5.

selection

Type of assignment of blocks into folds. Can be random (default), systematic or checkerboard pattern (not working with user-defined blocks).

iteration

Integer value. The number of attempts to create folds that fulfil the set requirement for minimum number of points in each category (training-presence, training-absence, testing-presence and testing-absence), as specified by numLimit value.

numLimit

Integer value. The minimum number of points in each category of data (see above - iterration). If numLimit = NULL, the most evenly dispersed number of records is chosen (given the number of iteration).

maskBySpecies

Logical. If raster layer is provided and maskBySpecies = TRUE, the blocks will be created based on the raster extent, but only those blocks covering species data is kept. The default is TRUE.

degMetre

Integer. The conversion rate of metres to degree. See the details section for more information.

border

SpatialPolygons* or sf object to clip the block based on a border. This might increase the computation time.

showBlocks

Logical. If TRUE the final blocks with fold numbers will be plotted. A raster layer could be specified in rasterlayer argument to be as background.

biomod2Format

Logical. Creates a matrix of folds that can be directly used in the biomod2 package as a DataSplitTable for cross-validation.

xOffset

Numeric value between 0 and 1 for shifting the blocks horizontally. The value is the proportion of block size.

yOffset

Numeric value between 0 and 1 for shifting the blocks vertically. The value is the proportion of block size.

progress

Logical. If TRUE shows a progress bar when numLimit = NULL in random fold selection.

Details

To keep the consistency, all the functions use metres as their unit. In this function, when the input map has geographic coordinate system (decimal degrees), the block size is calculated based on deviding theRange by 111325 (the standard distance of a degree in metres, on the Equator) to change the unit to degree. This value is optional and can be changed by user via degMetre argument.

The xOffset and yOffset can be used to change the spatial position of the blocks. It can also be used to assess the sensitivity of analysis results to shifting in the blocking arrangements. These options are available when theRange is defined. By default the region is located in the middle of the blocks and by setting the offsets, the blocks will shift.

Roberts et. al. (2017) suggest that blocks should be substantially bigger than the range of spatial autocorrelation (in model residual) to obtain realistic error estimates, while a buffer with the size of the spatial autocorrelation range would result in a good estimation of error. This is because of the so-called edge effect (O'Sullivan & Unwin, 2014), whereby points located on the edges of the blocks of opposite sets are not separated spatially. Blocking with a buffering strategy overcomes this issue (see buffering).

Value

An object of class S3. A list of objects including:

References

Bahn, V., & McGill, B. J. (2012). Testing the predictive performance of distribution models. Oikos, 122(3), 321–331.

O’Sullivan, D., Unwin, D.J., 2010. Geographic Information Analysis, 2nd ed. John Wiley & Sons.

Roberts et al., 2017. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography. 40: 913-929.

Wenger, S.J., Olden, J.D., 2012. Assessing transferability of ecological models: an underappreciated aspect of statistical validation. Methods Ecol. Evol. 3, 260–267.

See Also

spatialAutoRange and rangeExplorer for selecting block size; buffering and envBlock for alternative blocking strategies; foldExplorer for visualisation of the generated folds.

For DataSplitTable see BIOMOD_cv in biomod2 package

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
## Not run: 

# load package data
awt <- raster::brick(system.file("extdata", "awt.grd", package = "blockCV"))
# import presence-absence species data
PA <- read.csv(system.file("extdata", "PA.csv", package = "blockCV"))
# make a SpatialPointsDataFrame object from data.frame
pa_data <- sp::SpatialPointsDataFrame(PA[,c("x", "y")], PA, proj4string=raster::crs(awt))

# spatial blocking by specified range and random assignment
sb1 <- spatialBlock(speciesData = pa_data,
                    species = "Species",
                    theRange = 68000,
                    k = 5,
                    selection = 'random',
                    iteration = 250,
                    numLimit = NULL,
                    biomod2Format = TRUE,
                    xOffset = 0.3, # shift the blocks horizontally
                    yOffset = 0)

# spatial blocking by row/column and systematic fold assignment
sb2 <- spatialBlock(speciesData = pa_data,
                    species = "Species",
                    rasterLayer = awt,
                    rows = 5,
                    cols = 8,
                    k = 5,
                    selection = 'systematic',
                    maskBySpecies = TRUE,
                    biomod2Format = TRUE)


## End(Not run)

adamlilith/blockCV documentation built on May 25, 2019, 12:41 a.m.