sjSDM: Fitting scalable joint Species Distribution Models (sjSDM)
In sjSDM: Scalable Joint Species Distribution Modeling

sjSDM

R Documentation

Fitting scalable joint Species Distribution Models (sjSDM)

Description

sjSDM is used to fit joint Species Distribution models (jSDMs) using the central processing unit (CPU) or the graphical processing unit (GPU). The default is a multivariate probit model based on a Monte-Carlo approximation of the joint likelihood. sjSDM can be used to fit linear but also deep neural networks and supports the well known formula syntax.

Usage

sjSDM(
  Y = NULL,
  env = NULL,
  biotic = bioticStruct(),
  spatial = NULL,
  family = stats::binomial("probit"),
  iter = 100L,
  step_size = NULL,
  learning_rate = 0.01,
  se = FALSE,
  sampling = 100L,
  parallel = 0L,
  control = sjSDMControl(),
  device = "cpu",
  dtype = "float32",
  seed = 758341678,
  verbose = TRUE
)

sjSDM.tune(object)

Arguments

`Y`	matrix of species occurrences/responses in range
`env`	matrix of environmental predictors, object of type `linear` or `DNN`
`biotic`	defines biotic (species-species associations) structure, object of type `bioticStruct`
`spatial`	defines spatial structure, object of type `linear` or `DNN`
`family`	error distribution with link function, see details for supported distributions
`iter`	number of fitting iterations
`step_size`	batch size for stochastic gradient descent, if `NULL` then step_size is set to: `step_size = 0.1*nrow(X)`
`learning_rate`	learning rate for Adamax optimizer
`se`	calculate standard errors for environmental coefficients
`sampling`	number of sampling steps for Monte Carlo integration
`parallel`	number of cpu cores for the data loader, only necessary for large datasets
`control`	control parameters for optimizer, see `sjSDMControl`
`device`	which device to be used, "cpu" or "gpu"
`dtype`	which data type, most GPUs support only 32 bit floats.
`seed`	seed for random operations
`verbose`	`TRUE` or `FALSE`, indicating whether progress should be printed or not
`object`	object of type `sjSDM_cv`

Details

\loadmathjax

The function fits per default a multivariate probit model via Monte-Carlo integration (see Chen et al., 2018) of the joint likelihood for all species.

Model description

The most common jSDM structure describes the site (\mjseqni = 1, ..., I) by species (\mjseqnj = 1, ..., J) matrix \mjseqnY_ij as a function of environmental covariates \mjseqnX_in(\mjseqnn=1,...,N covariates), and the species-species covariance matrix \mjseqn\Sigma accounts for correlations in \mjseqne_ij:

\mjsdeqn

g(Z_ij) = \beta_j0 + \Sigma^N_n=1X_in\beta_nj + e_ij

with \mjseqng(.) as link function. For the multivariate probit model, the link function is:

\mjsdeqn

Y_ij=1(Z_ij > 0)

The probability to observe the occurrence vector \mjseqn\bfY_i is:

\mjsdeqn

Pr(\bfY_i|\bfX_i\beta, \Sigma) = \int_A_iJ...\int_A_i1 \phi_J(\bfY_i^\ast;\bfX_i\beta, \Sigma) dY_i1^\ast... dY_iJ^\ast

in the interval \mjseqnA_ij with \mjseqn(-\inf, 0] if \mjseqnY_ij=0 and \mjseqn [0, +\inf) if \mjseqnY_ij=1.

and \mjseqn\phi being the density function of the multivariate normal distribution.

The probability of \mjseqn\bfY_i requires to integrate over \mjseqn\bfY_i^\ast which has no closed analytical expression for more than two species which makes the evaluation of the likelihood computationally costly and needs a numerical approximation. The previous equation can be expressed more generally as:

\mjsdeqn \mathcal

L(\beta, \Sigma; \bfY_i, \bfX_i) = \int_\Omega \prod_j=1^J Pr(Y_ij|\bfX_i\beta+\zeta) Pr(\zeta|\Sigma) d\zeta

sjSDM approximates this integral by \mjseqnM Monte-Carlo samples from the multivariate normal species-species covariance. After integrating out the covariance term, the remaining part of the likelihood can be calculated as in an univariate case and the average of the \mjseqnM samples are used to get an approximation of the integral:

\mjsdeqn \mathcal

L(\beta, \Sigma; \bfY_i, \bfX_i) \approx \frac1M \Sigma_m=1^M \prod_j=1^J Pr(Y_ij|\bfX_i\beta+\zeta_m)

with \mjseqn \zeta_m \sim MVN(0, \Sigma).

sjSDM uses 'PyTorch' to run optionally the model on the graphical processing unit (GPU). Python dependencies needs to be installed before being able to use the sjSDM function. We provide a function which installs automatically python and the python dependencies. See install_sjSDM, vignette("Dependencies", package = "sjSDM")

See Pichler and Hartig, 2020 for benchmark results.

Supported distributions

Currently supported distributions and link functions, which are :

binomial: "probit" or "logit"
poisson: "log"
"nbinom": "log"
gaussian: "identity"

Space

We can extend the model to account for spatial auto-correlation between the sites by:

\mjsdeqn

g(Z_ij) = \beta_j0 + \Sigma^N_n=1X_in\beta_nj + \Sigma^M_m=1S_im\alpha_mj + e_ij

There are two ways to generate spatial predictors \mjseqnS:

trend surface model - using spatial coordinates in a polynomial:

linear(data=Coords, ~0+poly(X, Y, degree = 2))
eigenvector spatial filtering - using spatial eigenvectors. Spatial eigenvectors can be generated by the generateSpatialEV function:

SPV = generateSpatialEV(Coords)

Then we use, for example, the first 20 spatial eigenvectors:

linear(data=SPV[ ,1:20], ~0+.)

It is important to set the intercept to 0 in the spatial term (e.g. via ~0+.) because the intercept is already set in the environmental object.

Installation

install_sjSDM should be theoretically able to install conda and 'PyTorch' automatically. If sjSDM still does not work after reloading RStudio, you can try to solve this on your following our trouble shooting guide installation_help. If the problem remains, please create an issue on issue tracker with a copy of the install_diagnostic output as a quote.

Value

An S3 class of type 'sjSDM' including the following components:

`cl`	Model call
`formula`	Formula object for environmental covariates.
`names`	Names of environmental covariates.
`species`	Names of species (can be `NULL` if columns of Y are not named).
`get_model`	Method which builds and returns the underlying 'python' model.
`logLik`	negative log-Likelihood of the model and the regularization loss.
`model`	The actual model.
`settings`	List of model settings, see arguments of `sjSDM`.
`family`	Response family.
`time`	Runtime.
`data`	List of Y, X (and spatial) model matrices.
`sessionInfo`	Output of `sessionInfo`.
`weights`	List of model coefficients (environmental (and spatial)).
`sigma`	Lower triangular weight matrix for the covariance matrix.
`history`	History of iteration losses.
`se`	Matrix of standard errors, if `se = FALSE` the field 'se' is `NULL`.

Implemented S3 methods include summary.sjSDM, plot.sjSDM, print.sjSDM, predict.sjSDM, and coef.sjSDM. For other methods, see section 'See Also'.

sjSDM.tune returns an S3 object of class 'sjSDM', see above for information about values.

Author(s)

Maximilian Pichler

References

Chen, D., Xue, Y., & Gomes, C. P. (2018). End-to-end learning for the deep multivariate probit model. arXiv preprint arXiv:1803.08591.

Pichler, M., & Hartig, F. (2021). A new joint species distribution model for faster and more accurate inference of species associations from big community data. Methods in Ecology and Evolution, 12(11), 2159-2173.

Examples

## Not run: 
  
# Basic workflow:
## simulate community:
com = simulate_SDM(env = 3L, species = 7L, sites = 100L)

## fit model:
model = sjSDM(Y = com$response,env = com$env_weights, iter = 50L,
              verbose = FALSE) 
# increase iter for your own data 

# Default distribution is binomial("probit"). Alternatively, you can use
# binomial(logit), poisson("log"), "nbinom" (with log, still somewhat 
# experimental) and gaussian("identity")

coef(model)
summary(model)
getCov(model)

## plot results
species=c("sp1","sp2","sp3","sp4","sp5","sp6","sp7")
group=c("mammal","bird","fish","fish","mammal","amphibian","amphibian")
group = data.frame(species=species,group=group)
plot(model,group=group)

## calculate post-hoc p-values:
p = getSe(model)
summary(p)

## or turn on the option in the sjSDM function:
model = sjSDM(Y = com$response, env = com$env_weights, se = TRUE, 
              family = binomial("probit"), 
              iter = 2L,
              verbose = FALSE)
summary(model)

## fit model with interactions:
model = sjSDM(Y = com$response,
              env = linear(data = com$env_weights, formula = ~X1:X2 + X3), 
              se = TRUE,
              iter = 2L,
              verbose = FALSE) # increase iter for your own data 
summary(model)

## without intercept:
model = update(model, env_formula = ~0+X1:X2 + X3,
               verbose = FALSE)

summary(model)

## predict with model:
preds = predict(model, newdata = com$env_weights)

## calculate R-squared:
R2 = Rsquared(model)
print(R2)

# With spatial terms:
## linear spatial model
XY = matrix(rnorm(200), 100, 2)
model = sjSDM(Y = com$response, env = linear(com$env_weights), 
              spatial = linear(XY, ~0+X1:X2),
              iter = 50L,
              verbose = FALSE) # increase iter for your own data 
summary(model)
predict(model, newdata = com$env_weights, SP = XY)
R2 = Rsquared(model)
print(R2)

## Using spatial eigenvectors as predictors to account 
## for spatial autocorrelation is a common approach:
SPV = generateSpatialEV(XY)
model = sjSDM(Y = com$response, env = linear(com$env_weights), 
              spatial = linear(SPV, ~0+., lambda = 0.1),
              iter = 50L,
              verbose = FALSE) # increase iter for your own data 
summary(model)
predict(model, newdata = com$env_weights, SP = SPV)

## Visualize internal meta-community structure
an = anova(model,
           verbose = FALSE)

internal = internalStructure(an)
plot(internal)

## Visualize community assemlby effects 

plotAssemblyEffects(internal)

### see ?anova.sjSDM for mroe details


## non-linear(deep neural network) model
model = sjSDM(Y = com$response, env = linear(com$env_weights), 
              spatial = DNN(SPV,hidden = c(5L, 5L), ~0+.),
              iter = 2L,# increase iter for your own data 
              verbose = FALSE) 
summary(model)
predict(model, newdata = com$env_weights, SP = SPV)


# Regularization
## lambda is the regularization strength
## alpha weights the lasso or ridge penalty:
## - alpha = 0 --> pure lasso
## - alpha = 1.0 --> pure ridge
model = sjSDM(Y = com$response, 
              # mix of lasso and ridge
              env = linear(com$env_weights, lambda = 0.01, alpha = 0.5), 
              # we can do the same for the species-species associations
              biotic = bioticStruct(lambda = 0.01, alpha = 0.5),
              iter = 2L,# increase iter for your own data 
              verbose = FALSE) 
summary(model)
coef(model)
getCov(model)



# Anova 
com = simulate_SDM(env = 3L, species = 15L, sites = 200L, correlation = TRUE)

XY = matrix(rnorm(400), 200, 2)
SPV = generateSpatialEV(XY)
model = sjSDM(Y = com$response, env = linear(com$env_weights), 
              spatial = linear(SPV, ~0+.), 
              verbose = FALSE,
              iter = 50L) # increase iter for your own data 
result = anova(model, verbose = FALSE)
print(result)
plot(result)

## visualize internal meta-community structure
internal = internalStructure(an)
plot(internal)



# Deep neural networks
## we can fit also a deep neural network instead of a linear model:
model = sjSDM(Y = com$response,
              env = DNN(com$env_weights, hidden = c(10L, 10L, 10L)),
              verbose = FALSE,
              iter = 2L) # increase iter for your own data 
summary(model)
getCov(model)
pred = predict(model, newdata = com$env_weights)

## extract weights
weights = getWeights(model)

## we can also assign weights:
setWeights(model, weights)

## with regularization:
model = sjSDM(Y = com$response, 
              # mix of lasso and ridge
              env = DNN(com$env_weights, lambda = 0.01, alpha = 0.5), 
              # we can do the same for the species-species associations
              biotic = bioticStruct(lambda = 0.01, alpha = 0.5),
              verbose = FALSE,
              iter = 2L) # increase iter for your own data 
getCov(model)
getWeights(model)

## End(Not run)

sjSDM documentation built on Sept. 11, 2024, 7:18 p.m.

sjSDM index

README.md sjSDM: Getting started with sjSDM - a scalable joint Species Distribution Model

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sjSDM
Scalable Joint Species Distribution Modeling

sjSDM: Fitting scalable joint Species Distribution Models (sjSDM)
In sjSDM: Scalable Joint Species Distribution Modeling

Fitting scalable joint Species Distribution Models (sjSDM)

Description

Usage

Arguments

Details

Model description

Supported distributions

Space

Installation

Value

Author(s)

References

See Also

Examples

Related to sjSDM in sjSDM...

R Package Documentation

Browse R Packages

We want your feedback!

sjSDM Scalable Joint Species Distribution Modeling

sjSDM: Fitting scalable joint Species Distribution Models (sjSDM) In sjSDM: Scalable Joint Species Distribution Modeling

Fitting scalable joint Species Distribution Models (sjSDM)

Description

Usage

Arguments

Details

Model description

Supported distributions

Space

Installation

Value

Author(s)

References

See Also

Examples

Related to sjSDM in sjSDM...

R Package Documentation

Browse R Packages

We want your feedback!

sjSDM
Scalable Joint Species Distribution Modeling

sjSDM: Fitting scalable joint Species Distribution Models (sjSDM)
In sjSDM: Scalable Joint Species Distribution Modeling