create_sp | R Documentation |
Takes a raster stack of environmental variables and fits a user-defined function to a (subset) of these. This is then subsampled to generate a new conditional Gaussian simulation to generate a probability of occurrence surface, and converted to presence-absence for a defined prevalence. The idea is that the virtual species is related only to a subset of modelling variables, and that some modelling variables are somewhat related to those actually important to the species.
create_sp( envStack, spFun = "x[1] * x[6] * x[8]", spModel = "Sph", spPsill = 1, spRange = 100, propSamp = 0.5, prev = 0.1 )
envStack |
stack of environmental variables to generate base
probability surface (i.e. the output from |
spFun |
a function used in |
spModel |
model for variogram; default = "Sph" (see
|
spPsill |
(partial) sill of the variogram model; default = 1.5 (see
|
spRange |
range parameter of the variogram; default = 100 (see
|
propSamp |
proportion of cells to sample in each submodel (default = 0.5) |
prev |
proportion of cells to be defined as a presence |
The function first generates a base probability surface based upon a user-specified relationship with the variables. It then randomly subsamples this surface and generates a new conditional Gaussian simulation based on this subsample. Finally, the cells with the highest values are defined as presences, based on the desired overall prevalence.
If the variables are generated using create_env_nsets
, the
idea is that the species is related to a subset of the those variables
(although not precisely as we subsample the base probability surface).
However, when we run the sdm we will be using a number of variables
unrelated to the generation of the species probability surface, although
some of those will be somewhat related to the relevant variables
So we are attempting to include several pitfalls of an SDM on a real species, that generally are excluded when generating virtual species which could lead to unwarranted confidence on model performance in the real world:
There are unknown processes unrelated to environmental variables not included in the model (hence the extra conditional Gaussian simulation step) and so we can never perfectly recover the distribution
We will likely include environmental variables unrelated to the species distribution in the model
Some of these unnecessary variables will be somewhat correlated with the variable that is important (and thus a model may select the 'wrong' one)
There may be other spatial surfaces that may mislead the model, such as spatial variation in sampling effort
A raster stack of the virtual species. The first layer 'prob' is the probability of occurrence (range = 0-1), and 'pa' is the presence-absence distribution based on the desired prevalence.
A raster stack of environmental variables. For each variable all values are standardised between 0 and 1.
Charlie Marsh (charlie.marsh@mailbox.org) & Yoni Gavish, based on the original code from http://santiago.begueria.es/2010/10/generating-spatially-correlated-random-fields-with-r/
Variations on this method have been used to generate virtual species in:
Gavish, Y., Marsh, C.J., Kuemmerlen, M., Stoll, S., Haase, P., Kunin, W.E., 2017. Accounting for biotic interactions through alpha-diversity constraints in stacked species distribution models. Methods in Ecology and Evolution 8, 1092–1102. https://doi.org/10.1111/2041-210X.12731
Marsh, C.J., Gavish, Y., Kunin, W.E., Brummitt, N.A., 2019. Mind the gap: Can downscaling Area of Occupancy overcome sampling gaps when assessing IUCN Red List status? Diversity and Distributions 025, 1832–1845. https://doi.org/10.1111/ddi.12983
### first generate sets of related environmental variables set.seed(9999) envSet <- create_env_nsets(cellDims = c(100, 100), sets = c(4, 4, 3, 1), model = "Sph", psill = 1.5, dep1 = 1, rangeFun = function() exp(runif(1, 1, 6)), propSamp = 0.25) plot(envSet) # the species relationship to the environmental variables is defined through # spFun (although that surface is then subsampled for a new variogram). # In spFun the bracketed numbers refers to the layer number of the variable # In this example we multiply the 1st variable of each environmental set, but # the last set is not used (perhaps it is a nuisance sampling effort surface) # if you want you can define the function as it's own variable, or you can # specify it within the function argument (as in the later examples) envFun <- "x[1] * x[5] + x[9]" sp <- create_sp(envStack = envSet, spFun = envFun, spModel = "Sph", spPsill = 1, spRange = 50, propSamp = 0.5, prev = 0.1) # the output is a raster stack which can be plotted plot(sp) # spatial autocorrelation can be increased with a higher spRange value sp <- create_sp(envStack = envSet, spFun = "x[1] * x[5] * x[9]", spModel = "Sph", spPsill = 1, spRange = 500, propSamp = 0.5, prev = 0.1) plot(sp) # the higher the propSamp the less the species will deviate from the surface # defined in spFun. E.g. sp <- create_sp(envStack = envSet, spFun = "x[1] * x[5] * x[9]", spModel = "Sph", spPsill = 1, spRange = 50, propSamp = 0.95, prev = 0.1) # add the spFun defined surface sp$env <- calc(envSet, function(x) x[1] * x[5] * x[9]) plot(sp) # prevalence in the final presence-absence map can be controlled using prev sp <- create_sp(envStack = envSet, spFun = "x[1] * x[5] * x[9]", spModel = "Sph", spPsill = 1, spRange = 500, propSamp = 0.5, prev = 0.5) plot(sp)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.