create_sp: Generate a virtual species based on a set of environmental...
In charliem2003/sdmProfiling: A Spatially-explicit Species Distribution Model Evaluation Tool

View source: R/create_sp.R

create_sp

R Documentation

Generate a virtual species based on a set of environmental variables

Description

Takes a raster stack of environmental variables and fits a user-defined function to a (subset) of these. This is then subsampled to generate a new conditional Gaussian simulation to generate a probability of occurrence surface, and converted to presence-absence for a defined prevalence. The idea is that the virtual species is related only to a subset of modelling variables, and that some modelling variables are somewhat related to those actually important to the species.

Usage

create_sp(
  envStack,
  spFun = "x[1] * x[6] * x[8]",
  spModel = "Sph",
  spPsill = 1,
  spRange = 100,
  propSamp = 0.5,
  prev = 0.1
)

Arguments

`envStack`	stack of environmental variables to generate base probability surface (i.e. the output from `create_env_nsets` or `create_env_set`, although any raster stack will work)
`spFun`	a function used in `raster::calc` (in quotes) for generating the base probability surface from the raster stack of environmental variables, which is then subsampled for the variogram that generates the final surface. The function should take the form of the layer number in the raster stack, and how they should be treated. E.g. `"x[1] * x[6] * x[8]"` would mean multiplying layers 1, 6 and 8
`spModel`	model for variogram; default = "Sph" (see `vgm` for options)
`spPsill`	(partial) sill of the variogram model; default = 1.5 (see `vgm` for options)
`spRange`	range parameter of the variogram; default = 100 (see `vgm` for details)
`propSamp`	proportion of cells to sample in each submodel (default = 0.5)
`prev`	proportion of cells to be defined as a presence

Details

The function first generates a base probability surface based upon a user-specified relationship with the variables. It then randomly subsamples this surface and generates a new conditional Gaussian simulation based on this subsample. Finally, the cells with the highest values are defined as presences, based on the desired overall prevalence.

If the variables are generated using create_env_nsets, the idea is that the species is related to a subset of the those variables (although not precisely as we subsample the base probability surface). However, when we run the sdm we will be using a number of variables unrelated to the generation of the species probability surface, although some of those will be somewhat related to the relevant variables

So we are attempting to include several pitfalls of an SDM on a real species, that generally are excluded when generating virtual species which could lead to unwarranted confidence on model performance in the real world:

There are unknown processes unrelated to environmental variables not included in the model (hence the extra conditional Gaussian simulation step) and so we can never perfectly recover the distribution
We will likely include environmental variables unrelated to the species distribution in the model
Some of these unnecessary variables will be somewhat correlated with the variable that is important (and thus a model may select the 'wrong' one)
There may be other spatial surfaces that may mislead the model, such as spatial variation in sampling effort

Value

A raster stack of the virtual species. The first layer 'prob' is the probability of occurrence (range = 0-1), and 'pa' is the presence-absence distribution based on the desired prevalence.

A raster stack of environmental variables. For each variable all values are standardised between 0 and 1.

Author(s)

Charlie Marsh (charlie.marsh@mailbox.org) & Yoni Gavish, based on the original code from http://santiago.begueria.es/2010/10/generating-spatially-correlated-random-fields-with-r/

References

Variations on this method have been used to generate virtual species in:

Gavish, Y., Marsh, C.J., Kuemmerlen, M., Stoll, S., Haase, P., Kunin, W.E., 2017. Accounting for biotic interactions through alpha-diversity constraints in stacked species distribution models. Methods in Ecology and Evolution 8, 1092–1102. https://doi.org/10.1111/2041-210X.12731

Marsh, C.J., Gavish, Y., Kunin, W.E., Brummitt, N.A., 2019. Mind the gap: Can downscaling Area of Occupancy overcome sampling gaps when assessing IUCN Red List status? Diversity and Distributions 025, 1832–1845. https://doi.org/10.1111/ddi.12983

Examples


### first generate sets of related environmental variables

set.seed(9999)
envSet <- create_env_nsets(cellDims = c(100, 100),
                           sets     = c(4, 4, 3, 1),
                           model    = "Sph",
                           psill    = 1.5,
                           dep1     = 1,
                           rangeFun = function() exp(runif(1, 1, 6)),
                           propSamp = 0.25)

plot(envSet)

# the species relationship to the environmental variables is defined through
# spFun (although that surface is then subsampled for a new variogram).

# In spFun the bracketed numbers refers to the layer number of the variable
# In this example we multiply the 1st variable of each environmental set, but
# the last set is not used (perhaps it is a nuisance sampling effort surface)

# if you want you can define the function as it's own variable, or you can
# specify it within the function argument (as in the later examples)
envFun <- "x[1] * x[5] + x[9]"
sp <- create_sp(envStack = envSet,
                spFun    = envFun,
                spModel  = "Sph",
                spPsill  = 1,
                spRange  = 50,
                propSamp = 0.5,
                prev     = 0.1)

# the output is a raster stack which can be plotted
plot(sp)

# spatial autocorrelation can be increased with a higher spRange value
sp <- create_sp(envStack = envSet,
                spFun    = "x[1] * x[5] * x[9]",
                spModel  = "Sph",
                spPsill  = 1,
                spRange  = 500,
                propSamp = 0.5,
                prev     = 0.1)
plot(sp)

# the higher the propSamp the less the species will deviate from the surface
# defined in spFun. E.g.
sp <- create_sp(envStack = envSet,
                spFun    = "x[1] * x[5] * x[9]",
                spModel  = "Sph",
                spPsill  = 1,
                spRange  = 50,
                propSamp = 0.95,
                prev     = 0.1)
# add the spFun defined surface
sp$env <- calc(envSet, function(x) x[1] * x[5] * x[9])
plot(sp)

# prevalence in the final presence-absence map can be controlled using prev
sp <- create_sp(envStack = envSet,
                spFun    = "x[1] * x[5] * x[9]",
                spModel  = "Sph",
                spPsill  = 1,
                spRange  = 500,
                propSamp = 0.5,
                prev     = 0.5)
plot(sp)

charliem2003/sdmProfiling documentation built on June 13, 2022, 4:43 a.m.