Home

/

GitHub

/

envBlock: Use environmental clustering to separate train and test folds

envBlock: Use environmental clustering to separate train and test folds
In adamlilith/blockCV: Spatial and environmental blocking for k-fold cross-validation

Description Usage Arguments Details Value References See Also Examples

Environmental blocking for cross-validation. This function uses clustering methods to specify sets of similar environmental conditions based on the input covariates. Species data corresponding to any of these groups or clusters are assigned to a fold. This function does the clustering in raster space and species data. Clustering is done using kmeans for both approaches (for raster using RStoolbox which use the same function internally). This function works on single or multiple raster files; multiple rasters need to be in a brick or stack format.

1
2
3

envBlock(rasterLayer, speciesData, species = NULL, k = 5,
  standardization = "normal", rasterBlock = TRUE,
  biomod2Format = TRUE, numLimit = 0)

`rasterLayer`	RasterLayer, RasterBrick or RasterStack of covariates to identify environmental groups.
`speciesData`	A SpatialPointsDataFrame, SpatialPoints or sf object containing species data.
`species`	Character. Indicating the name of the field in which species presence/absence data (0s and 1s) are stored. If `speceis = NULL` the presence and absence data will be treated the same and only training and testing records will be counted.
`k`	Integer value. The number of desired folds for cross-validation. The default is `k = 5`.
`standardization`	Standardize input raster layers. Three possible inputs are "normal" (the default), "standard" and "none". See details for more information.
`rasterBlock`	Logical. If TRUE, the clustering is done in the raster layer rather than species data. See details for more information.
`biomod2Format`	Logical. Creates a matrix of folds that can be directly used in the biomod2 package as a DataSplitTable for cross-validation.
`numLimit`	Integer value. The minimum number of points in each category of data (training-presence, training-absence, testing-presence and testing-absence). Shows a message if the number of points in any of the folds happens to be less than this number.

As k-means algorithms use Euclidean distance to estimate clusters, the input covariates should be quantitative variables. Since variables with wider ranges of values might dominate the clusters and bias the environmental clustering (Hastie et al., 2009), all the input rasters are first standardized within the function. This is done either by normalizing based on subtracting the mean and dividing by the standard deviation of each raster (the default) or optionally by standardizing using linear scaling to constrain all raster values between 0 and 1.

By default, the clustering is done in the raster space. In this approach the clusters will be consistent throughout the region and across species (in the same region). However, this may result in a cluster(s) that covers none of the species records, espcially when species data is not dispersed throughout the region or the number of clusters (k or folds) is high. In this case, the number of folds is less than specified k. If rasterBlock = FALSE, the clustering will be done in species points and the number of the folds will be the same as k.

Note that the input raster layer should cover all the species points, otherwise an error will rise. The records with no raster value should be deleted prior to the analysis or another raster layer would be provided.

An object of class S3. A list of objects including:

folds - a list containing the folds. Each fold has two vectors with the training (first) and testing (second) indices
foldID - a vector of values indicating the number of the fold for each observation (each number corresponds to the same point in species data)
biomodTable - a matrix with the folds to be used in biomod2 package
k - number of the folds
species - the name of the species (column), if provided
records - a table with the number of points in each category of training and testing

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data Mining, Inference, and Prediction (2nd ed., Vol. 1). Springer series in statistics New York.

Roberts et al., 2017. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography. 40: 913-929.

spatialBlock and buffering for alternative blocking strategies; foldExplorer for visualisation of the generated folds.

For DataSplitTable see BIOMOD_cv in biomod2 package. unsuperClass for clustering.

## Not run: 

# load package data
awt <- raster::brick(system.file("extdata", "awt.grd", package = "blockCV"))
# import presence-absence species data
PA <- read.csv(system.file("extdata", "PA.csv", package = "blockCV"))
# make a SpatialPointsDataFrame object from data.frame
pa_data <- sp::SpatialPointsDataFrame(PA[,c("x", "y")], PA, proj4string=raster::crs(awt))

# environmental clustering
eb <- envBlock(rasterLayer = awt,
               speciesData = pa_data,
               species = "Species", # name of the column with species data
               k = 5,
               standardization = "standard",
               rasterBlock = TRUE,
               numLimit = 50)

## End(Not run)

adamlilith/blockCV documentation built on May 25, 2019, 12:41 a.m.

adamlilith/blockCV index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

adamlilith/blockCV
Spatial and environmental blocking for k-fold cross-validation

envBlock: Use environmental clustering to separate train and test folds
In adamlilith/blockCV: Spatial and environmental blocking for k-fold cross-validation

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to envBlock in adamlilith/blockCV...

R Package Documentation

Browse R Packages

We want your feedback!

adamlilith/blockCV Spatial and environmental blocking for k-fold cross-validation

envBlock: Use environmental clustering to separate train and test folds In adamlilith/blockCV: Spatial and environmental blocking for k-fold cross-validation

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to envBlock in adamlilith/blockCV...

R Package Documentation

Browse R Packages

We want your feedback!

adamlilith/blockCV
Spatial and environmental blocking for k-fold cross-validation

envBlock: Use environmental clustering to separate train and test folds
In adamlilith/blockCV: Spatial and environmental blocking for k-fold cross-validation