Home

/

GitHub

/

spatialBlock: Use spatial blocks to separate train and test folds

spatialBlock: Use spatial blocks to separate train and test folds
In adamlilith/blockCV: Spatial and environmental blocking for k-fold cross-validation

Description Usage Arguments Details Value References See Also Examples

This function creates spatially separated folds based on a pre-specified distance. It assigns blocks to the training and testing folds randomly, systematically or in a checkerboard pattern. The distance (theRange) should be in metres, regardless of the unit of the reference system of the input data (for more information see the details section). By default, the function creates blocks according to the extent and shape of the study area, assuming that the user has considered the landscape for the given species and case study. Alternatively, blocks can be created based on species spatial data. This is especially useful when the species data is not evenly dispersed in the whole region. Blocks can also be offset so the origin is not at the outer corner of the rasters. Instead of providing a distance, the blocks can also be created by specifying a number of rows and columns and divide the study area into vertical or horizontal bins, as presented in Wenger & Olden (2012) and Bahn & McGill (2012). Finally, the blocks can be specified by a user-defined spatial polygon layer.

spatialBlock(speciesData, species = NULL, blocks = NULL,
  rasterLayer = NULL, theRange = NULL, rows = NULL, cols = NULL,
  k = 5, selection = "random", iteration = 250, numLimit = NULL,
  maskBySpecies = TRUE, degMetre = 111325, border = NULL,
  showBlocks = TRUE, biomod2Format = TRUE, xOffset = 0,
  yOffset = 0, progress = TRUE)

`speciesData`	A SpatialPointsDataFrame, SpatialPoints or sf object containing species data.
`species`	Character. Indicating the name of the field in which species presence/absence data (0s and 1s) are stored. If `speceis = NULL` the presence and absence data will be treated the same and only training and testing records will be counted.
`blocks`	A SpatialPolygons* or sf object to be used as the blocks. This can be a user defined polygon and it must cover all the species points.
`rasterLayer`	RasterLayer for visualisation. If provided, this will be used to specify the blocks covering the area.
`theRange`	Numeric value of the specified range by which blocks are created and training/testing data are separated. This distance should be in metres. The range could be explored by `spatialAutoRange()` and `rangeExplorer()` functions.
`rows`	Integer value by which the area is divided into latitudinal bins.
`cols`	Integer value by which the area is divided into longitudinal bins.
`k`	Integer value. The number of desired folds for cross-validation. The default is `k = 5`.
`selection`	Type of assignment of blocks into folds. Can be random (default), systematic or checkerboard pattern (not working with user-defined blocks).
`iteration`	Integer value. The number of attempts to create folds that fulfil the set requirement for minimum number of points in each category (training-presence, training-absence, testing-presence and testing-absence), as specified by `numLimit` value.
`numLimit`	Integer value. The minimum number of points in each category of data (see above - `iterration`). If `numLimit = NULL`, the most evenly dispersed number of records is chosen (given the number of iteration).
`maskBySpecies`	Logical. If raster layer is provided and `maskBySpecies = TRUE`, the blocks will be created based on the raster extent, but only those blocks covering species data is kept. The default is `TRUE`.
`degMetre`	Integer. The conversion rate of metres to degree. See the details section for more information.
`border`	SpatialPolygons* or sf object to clip the block based on a border. This might increase the computation time.
`showBlocks`	Logical. If TRUE the final blocks with fold numbers will be plotted. A raster layer could be specified in `rasterlayer` argument to be as background.
`biomod2Format`	Logical. Creates a matrix of folds that can be directly used in the biomod2 package as a DataSplitTable for cross-validation.
`xOffset`	Numeric value between 0 and 1 for shifting the blocks horizontally. The value is the proportion of block size.
`yOffset`	Numeric value between 0 and 1 for shifting the blocks vertically. The value is the proportion of block size.
`progress`	Logical. If TRUE shows a progress bar when `numLimit = NULL` in random fold selection.

To keep the consistency, all the functions use metres as their unit. In this function, when the input map has geographic coordinate system (decimal degrees), the block size is calculated based on deviding theRange by 111325 (the standard distance of a degree in metres, on the Equator) to change the unit to degree. This value is optional and can be changed by user via degMetre argument.

The xOffset and yOffset can be used to change the spatial position of the blocks. It can also be used to assess the sensitivity of analysis results to shifting in the blocking arrangements. These options are available when theRange is defined. By default the region is located in the middle of the blocks and by setting the offsets, the blocks will shift.

Roberts et. al. (2017) suggest that blocks should be substantially bigger than the range of spatial autocorrelation (in model residual) to obtain realistic error estimates, while a buffer with the size of the spatial autocorrelation range would result in a good estimation of error. This is because of the so-called edge effect (O'Sullivan & Unwin, 2014), whereby points located on the edges of the blocks of opposite sets are not separated spatially. Blocking with a buffering strategy overcomes this issue (see buffering).

An object of class S3. A list of objects including:

folds - a list containing the folds. Each fold has two vectors with the training (first) and testing (second) indices
foldID - a vector of values indicating the number of the fold for each observation (each number corresponds to the same point in species data)
biomodTable - a matrix with the folds to be used in biomod2 package
k - number of the folds
blocks - SpatialPolygon of the blocks
range - the distance band of separating trainig and testing folds, if provided
species - the name of the species (column), if provided
plots - ggplot object
records - a table with the number of points in each category of training and testing

Bahn, V., & McGill, B. J. (2012). Testing the predictive performance of distribution models. Oikos, 122(3), 321–331.

O’Sullivan, D., Unwin, D.J., 2010. Geographic Information Analysis, 2nd ed. John Wiley & Sons.

Roberts et al., 2017. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography. 40: 913-929.

Wenger, S.J., Olden, J.D., 2012. Assessing transferability of ecological models: an underappreciated aspect of statistical validation. Methods Ecol. Evol. 3, 260–267.

spatialAutoRange and rangeExplorer for selecting block size; buffering and envBlock for alternative blocking strategies; foldExplorer for visualisation of the generated folds.

For DataSplitTable see BIOMOD_cv in biomod2 package

## Not run: 

# load package data
awt <- raster::brick(system.file("extdata", "awt.grd", package = "blockCV"))
# import presence-absence species data
PA <- read.csv(system.file("extdata", "PA.csv", package = "blockCV"))
# make a SpatialPointsDataFrame object from data.frame
pa_data <- sp::SpatialPointsDataFrame(PA[,c("x", "y")], PA, proj4string=raster::crs(awt))

# spatial blocking by specified range and random assignment
sb1 <- spatialBlock(speciesData = pa_data,
                    species = "Species",
                    theRange = 68000,
                    k = 5,
                    selection = 'random',
                    iteration = 250,
                    numLimit = NULL,
                    biomod2Format = TRUE,
                    xOffset = 0.3, # shift the blocks horizontally
                    yOffset = 0)

# spatial blocking by row/column and systematic fold assignment
sb2 <- spatialBlock(speciesData = pa_data,
                    species = "Species",
                    rasterLayer = awt,
                    rows = 5,
                    cols = 8,
                    k = 5,
                    selection = 'systematic',
                    maskBySpecies = TRUE,
                    biomod2Format = TRUE)


## End(Not run)

adamlilith/blockCV documentation built on May 25, 2019, 12:41 a.m.

adamlilith/blockCV index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

adamlilith/blockCV
Spatial and environmental blocking for k-fold cross-validation

spatialBlock: Use spatial blocks to separate train and test folds
In adamlilith/blockCV: Spatial and environmental blocking for k-fold cross-validation

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to spatialBlock in adamlilith/blockCV...

R Package Documentation

Browse R Packages

We want your feedback!

adamlilith/blockCV Spatial and environmental blocking for k-fold cross-validation

spatialBlock: Use spatial blocks to separate train and test folds In adamlilith/blockCV: Spatial and environmental blocking for k-fold cross-validation

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to spatialBlock in adamlilith/blockCV...

R Package Documentation

Browse R Packages

We want your feedback!

adamlilith/blockCV
Spatial and environmental blocking for k-fold cross-validation

spatialBlock: Use spatial blocks to separate train and test folds
In adamlilith/blockCV: Spatial and environmental blocking for k-fold cross-validation