sjSDM | R Documentation |
sjSDM
is used to fit joint Species Distribution models (jSDMs) using the central processing unit (CPU) or the graphical processing unit (GPU).
The default is a multivariate probit model based on a Monte-Carlo approximation of the joint likelihood.
sjSDM
can be used to fit linear but also deep neural networks and supports the well known formula syntax.
sjSDM(
Y = NULL,
env = NULL,
biotic = bioticStruct(),
spatial = NULL,
family = stats::binomial("probit"),
iter = 100L,
step_size = NULL,
learning_rate = 0.01,
se = FALSE,
sampling = 100L,
parallel = 0L,
control = sjSDMControl(),
device = "cpu",
dtype = "float32",
seed = 758341678,
verbose = TRUE
)
sjSDM.tune(object)
Y |
matrix of species occurrences/responses in range |
env |
matrix of environmental predictors, object of type |
biotic |
defines biotic (species-species associations) structure, object of type |
spatial |
defines spatial structure, object of type |
family |
error distribution with link function, see details for supported distributions |
iter |
number of fitting iterations |
step_size |
batch size for stochastic gradient descent, if |
learning_rate |
learning rate for Adamax optimizer |
se |
calculate standard errors for environmental coefficients |
sampling |
number of sampling steps for Monte Carlo integration |
parallel |
number of cpu cores for the data loader, only necessary for large datasets |
control |
control parameters for optimizer, see |
device |
which device to be used, "cpu" or "gpu" |
dtype |
which data type, most GPUs support only 32 bit floats. |
seed |
seed for random operations |
verbose |
|
object |
object of type |
The function fits per default a multivariate probit model via Monte-Carlo integration (see Chen et al., 2018) of the joint likelihood for all species.
The most common jSDM structure describes the site (\mjseqni = 1, ..., I) by species (\mjseqnj = 1, ..., J) matrix \mjseqnY_ij as a function of environmental covariates \mjseqnX_in(\mjseqnn=1,...,N covariates), and the species-species covariance matrix \mjseqn\Sigma accounts for correlations in \mjseqne_ij:
\mjsdeqng(Z_ij) = \beta_j0 + \Sigma^N_n=1X_in\beta_nj + e_ij
with \mjseqng(.) as link function. For the multivariate probit model, the link function is:
\mjsdeqnY_ij=1(Z_ij > 0)
The probability to observe the occurrence vector \mjseqn\bfY_i is:
\mjsdeqnPr(\bfY_i|\bfX_i\beta, \Sigma) = \int_A_iJ...\int_A_i1 \phi_J(\bfY_i^\ast;\bfX_i\beta, \Sigma) dY_i1^\ast... dY_iJ^\ast
in the interval \mjseqnA_ij with \mjseqn(-\inf, 0] if \mjseqnY_ij=0 and \mjseqn [0, +\inf) if \mjseqnY_ij=1.
and \mjseqn\phi being the density function of the multivariate normal distribution.
The probability of \mjseqn\bfY_i requires to integrate over \mjseqn\bfY_i^\ast which has no closed analytical expression for more than two species which makes the evaluation of the likelihood computationally costly and needs a numerical approximation. The previous equation can be expressed more generally as:
\mjsdeqn \mathcalL(\beta, \Sigma; \bfY_i, \bfX_i) = \int_\Omega \prod_j=1^J Pr(Y_ij|\bfX_i\beta+\zeta) Pr(\zeta|\Sigma) d\zeta
sjSDM
approximates this integral by \mjseqnM Monte-Carlo samples from the multivariate normal species-species covariance.
After integrating out the covariance term, the remaining part of the likelihood can be calculated as in an univariate case and the average
of the \mjseqnM samples are used to get an approximation of the integral:
L(\beta, \Sigma; \bfY_i, \bfX_i) \approx \frac1M \Sigma_m=1^M \prod_j=1^J Pr(Y_ij|\bfX_i\beta+\zeta_m)
with \mjseqn \zeta_m \sim MVN(0, \Sigma).
sjSDM
uses 'PyTorch' to run optionally the model on the graphical processing unit (GPU). Python dependencies needs to be
installed before being able to use the sjSDM
function. We provide a function which installs automatically python and the python dependencies.
See install_sjSDM
, vignette("Dependencies", package = "sjSDM")
See Pichler and Hartig, 2020 for benchmark results.
Currently supported distributions and link functions, which are :
binomial
: "probit"
or "logit"
poisson
: "log"
"nbinom"
: "log"
gaussian
: "identity"
We can extend the model to account for spatial auto-correlation between the sites by:
\mjsdeqng(Z_ij) = \beta_j0 + \Sigma^N_n=1X_in\beta_nj + \Sigma^M_m=1S_im\alpha_mj + e_ij
There are two ways to generate spatial predictors \mjseqnS:
trend surface model - using spatial coordinates in a polynomial:
linear(data=Coords, ~0+poly(X, Y, degree = 2))
eigenvector spatial filtering - using spatial eigenvectors.
Spatial eigenvectors can be generated by the generateSpatialEV
function:
SPV = generateSpatialEV(Coords)
Then we use, for example, the first 20 spatial eigenvectors:
linear(data=SPV[ ,1:20], ~0+.)
It is important to set the intercept to 0 in the spatial term (e.g. via ~0+.
) because the intercept is already set in the environmental object.
install_sjSDM
should be theoretically able to install conda and 'PyTorch' automatically. If sjSDM
still does not work after reloading RStudio, you can try to solve this on your following our trouble shooting guide installation_help
.
If the problem remains, please create an issue on issue tracker with a copy of the install_diagnostic
output as a quote.
An S3 class of type 'sjSDM' including the following components:
cl |
Model call |
formula |
Formula object for environmental covariates. |
names |
Names of environmental covariates. |
species |
Names of species (can be |
get_model |
Method which builds and returns the underlying 'python' model. |
logLik |
negative log-Likelihood of the model and the regularization loss. |
model |
The actual model. |
settings |
List of model settings, see arguments of |
family |
Response family. |
time |
Runtime. |
data |
List of Y, X (and spatial) model matrices. |
sessionInfo |
Output of |
weights |
List of model coefficients (environmental (and spatial)). |
sigma |
Lower triangular weight matrix for the covariance matrix. |
history |
History of iteration losses. |
se |
Matrix of standard errors, if |
Implemented S3 methods include summary.sjSDM
, plot.sjSDM
, print.sjSDM
, predict.sjSDM
, and coef.sjSDM
. For other methods, see section 'See Also'.
sjSDM.tune
returns an S3 object of class 'sjSDM', see above for information about values.
Maximilian Pichler
Chen, D., Xue, Y., & Gomes, C. P. (2018). End-to-end learning for the deep multivariate probit model. arXiv preprint arXiv:1803.08591.
Pichler, M., & Hartig, F. (2021). A new joint species distribution model for faster and more accurate inference of species associations from big community data. Methods in Ecology and Evolution, 12(11), 2159-2173.
getCor
, getCov
, update.sjSDM
, sjSDM_cv
, DNN
, plot.sjSDM
, print.sjSDM
, predict.sjSDM
, coef.sjSDM
, summary.sjSDM
, simulate.sjSDM
, getSe
, anova.sjSDM
, importance
## Not run:
# Basic workflow:
## simulate community:
com = simulate_SDM(env = 3L, species = 7L, sites = 100L)
## fit model:
model = sjSDM(Y = com$response,env = com$env_weights, iter = 50L,
verbose = FALSE)
# increase iter for your own data
# Default distribution is binomial("probit"). Alternatively, you can use
# binomial(logit), poisson("log"), "nbinom" (with log, still somewhat
# experimental) and gaussian("identity")
coef(model)
summary(model)
getCov(model)
## plot results
species=c("sp1","sp2","sp3","sp4","sp5","sp6","sp7")
group=c("mammal","bird","fish","fish","mammal","amphibian","amphibian")
group = data.frame(species=species,group=group)
plot(model,group=group)
## calculate post-hoc p-values:
p = getSe(model)
summary(p)
## or turn on the option in the sjSDM function:
model = sjSDM(Y = com$response, env = com$env_weights, se = TRUE,
family = binomial("probit"),
iter = 2L,
verbose = FALSE)
summary(model)
## fit model with interactions:
model = sjSDM(Y = com$response,
env = linear(data = com$env_weights, formula = ~X1:X2 + X3),
se = TRUE,
iter = 2L,
verbose = FALSE) # increase iter for your own data
summary(model)
## without intercept:
model = update(model, env_formula = ~0+X1:X2 + X3,
verbose = FALSE)
summary(model)
## predict with model:
preds = predict(model, newdata = com$env_weights)
## calculate R-squared:
R2 = Rsquared(model)
print(R2)
# With spatial terms:
## linear spatial model
XY = matrix(rnorm(200), 100, 2)
model = sjSDM(Y = com$response, env = linear(com$env_weights),
spatial = linear(XY, ~0+X1:X2),
iter = 50L,
verbose = FALSE) # increase iter for your own data
summary(model)
predict(model, newdata = com$env_weights, SP = XY)
R2 = Rsquared(model)
print(R2)
## Using spatial eigenvectors as predictors to account
## for spatial autocorrelation is a common approach:
SPV = generateSpatialEV(XY)
model = sjSDM(Y = com$response, env = linear(com$env_weights),
spatial = linear(SPV, ~0+., lambda = 0.1),
iter = 50L,
verbose = FALSE) # increase iter for your own data
summary(model)
predict(model, newdata = com$env_weights, SP = SPV)
## Visualize internal meta-community structure
an = anova(model,
verbose = FALSE)
internal = internalStructure(an)
plot(internal)
## Visualize community assemlby effects
plotAssemblyEffects(internal)
### see ?anova.sjSDM for mroe details
## non-linear(deep neural network) model
model = sjSDM(Y = com$response, env = linear(com$env_weights),
spatial = DNN(SPV,hidden = c(5L, 5L), ~0+.),
iter = 2L,# increase iter for your own data
verbose = FALSE)
summary(model)
predict(model, newdata = com$env_weights, SP = SPV)
# Regularization
## lambda is the regularization strength
## alpha weights the lasso or ridge penalty:
## - alpha = 0 --> pure lasso
## - alpha = 1.0 --> pure ridge
model = sjSDM(Y = com$response,
# mix of lasso and ridge
env = linear(com$env_weights, lambda = 0.01, alpha = 0.5),
# we can do the same for the species-species associations
biotic = bioticStruct(lambda = 0.01, alpha = 0.5),
iter = 2L,# increase iter for your own data
verbose = FALSE)
summary(model)
coef(model)
getCov(model)
# Anova
com = simulate_SDM(env = 3L, species = 15L, sites = 200L, correlation = TRUE)
XY = matrix(rnorm(400), 200, 2)
SPV = generateSpatialEV(XY)
model = sjSDM(Y = com$response, env = linear(com$env_weights),
spatial = linear(SPV, ~0+.),
verbose = FALSE,
iter = 50L) # increase iter for your own data
result = anova(model, verbose = FALSE)
print(result)
plot(result)
## visualize internal meta-community structure
internal = internalStructure(an)
plot(internal)
# Deep neural networks
## we can fit also a deep neural network instead of a linear model:
model = sjSDM(Y = com$response,
env = DNN(com$env_weights, hidden = c(10L, 10L, 10L)),
verbose = FALSE,
iter = 2L) # increase iter for your own data
summary(model)
getCov(model)
pred = predict(model, newdata = com$env_weights)
## extract weights
weights = getWeights(model)
## we can also assign weights:
setWeights(model, weights)
## with regularization:
model = sjSDM(Y = com$response,
# mix of lasso and ridge
env = DNN(com$env_weights, lambda = 0.01, alpha = 0.5),
# we can do the same for the species-species associations
biotic = bioticStruct(lambda = 0.01, alpha = 0.5),
verbose = FALSE,
iter = 2L) # increase iter for your own data
getCov(model)
getWeights(model)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.