jsdgam | R Documentation |
This function sets up a Joint Species Distribution Model whereby the residual associations among species can be modelled in a reduced-rank format using a set of latent factors. The factor specification is extremely flexible, allowing users to include spatial, temporal or any other type of predictor effects to more efficiently capture unmodelled residual associations, while the observation model can also be highly flexible (including all smooth, GP and other effects that mvgam can handle)
jsdgam(
formula,
factor_formula = ~-1,
knots,
factor_knots,
data,
newdata,
family = poisson(),
unit = time,
species = series,
share_obs_params = FALSE,
priors,
n_lv = 2,
backend = getOption("brms.backend", "cmdstanr"),
algorithm = getOption("brms.algorithm", "sampling"),
control = list(max_treedepth = 10, adapt_delta = 0.8),
chains = 4,
burnin = 500,
samples = 500,
thin = 1,
parallel = TRUE,
threads = 1,
silent = 1,
run_model = TRUE,
return_model_data = FALSE,
residuals = TRUE,
...
)
formula |
A |
factor_formula |
A |
knots |
An optional |
factor_knots |
An optional |
data |
A |
newdata |
Optional |
family |
Default is |
unit |
The unquoted name of the variable that represents the unit of analysis in |
species |
The unquoted name of the |
share_obs_params |
|
priors |
An optional |
n_lv |
|
backend |
Character string naming the package to use as the backend for fitting
the Stan model. Options are "cmdstanr" (the default) or "rstan". Can be set globally
for the current R session via the |
algorithm |
Character string naming the estimation approach to use.
Options are |
control |
A named |
chains |
|
burnin |
|
samples |
|
thin |
Thinning interval for monitors. Ignored
if |
parallel |
|
threads |
|
silent |
Verbosity level between |
run_model |
|
return_model_data |
|
residuals |
Logical indicating whether to compute series-level randomized quantile residuals and include
them as part of the returned object. Defaults to |
... |
Other arguments to pass to mvgam |
Joint Species Distribution Models allow for responses of multiple species to be
learned hierarchically, whereby responses to environmental variables in formula
can be partially
pooled and any latent, unmodelled residual associations can also be learned. In mvgam, both of
these effects can be modelled with the full power of latent factor Hierarchical GAMs, providing unmatched
flexibility to model full communities of species. When calling jsdgam, an initial State-Space model using
trend = 'None'
is set up and then modified to include the latent factors and their linear predictors.
Consequently, you can inspect priors for these models using get_mvgam_priors by supplying the relevant
formula
, factor_formula
, data
and family
arguments and keeping the default trend = 'None'
.
In a JSDGAM, the expectation of response Y_{ij}
is modelled with
g(\mu_{ij}) = X_i\beta + u_i\theta_j,
where g(.)
is a known link function,
X
is a design matrix of linear predictors (with associated \beta
coefficients),
u
are n_{lv}
-variate latent factors
(n_{lv}
<<n_{species}
) and
\theta_j
are species-specific loadings on the latent factors, respectively. The design matrix
X
and \beta
coefficients are constructed and modelled using formula
and can contain
any of mvgam
's predictor effects, including random intercepts and slopes, multidimensional penalized
smooths, GP effects etc... The factor loadings \theta_j
are constrained for identifiability but can
be used to reconstruct an estimate of the species' residual variance-covariance matrix
using \Theta \Theta'
(see the example below and residual_cor()
for details).
The latent factors are further modelled using:
u_i \sim \text{Normal}(Q_i\beta_{factor}, 1) \quad
where the second design matrix Q
and associated \beta_{factor}
coefficients are
constructed and modelled using factor_formula
. Again, the effects that make up this linear
predictor can contain any of mvgam
's allowed predictor effects, providing enormous flexibility for
modelling species' communities.
A list
object of class mvgam
containing model output,
the text representation of the model file,
the mgcv model output (for easily generating simulations at
unsampled covariate values), Dunn-Smyth residuals for each species and key information needed
for other functions in the package. See mvgam-class
for details.
Use methods(class = "mvgam")
for an overview on available methods
Nicholas J Clark
Nicholas J Clark & Konstans Wells (2023). Dynamic generalised additive models (DGAMs) for forecasting discrete ecological time series.
Methods in Ecology and Evolution. 14:3, 771-784.
David I Warton, F Guillaume Blanchet, Robert B O’Hara, Otso Ovaskainen, Sara Taskinen, Steven C
Walker & Francis KC Hui (2015). So many variables: joint modeling in community ecology.
Trends in Ecology & Evolution 30:12, 766-779.
mvgam()
, residual_cor()
# Fit a JSDGAM to the portal_data captures
mod <- jsdgam(
formula = captures ~
# Fixed effects of NDVI and mintemp, row effect as a GP of time
ndvi_ma12:series + mintemp:series + gp(time, k = 15),
factor_formula = ~ -1,
data = portal_data,
unit = time,
species = series,
family = poisson(),
n_lv = 2,
silent = 2,
chains = 2
)
# Plot covariate effects
library(ggplot2); theme_set(theme_bw())
plot_predictions(
mod,
condition = c('ndvi_ma12','series', 'series')
)
plot_predictions(
mod,
condition = c('mintemp','series', 'series')
)
# A residual correlation plot
plot(
residual_cor(mod)
)
# An ordination biplot can also be constructed
# from the factor scores and their loadings
if(requireNamespace('ggrepel', quietly = TRUE)){
ordinate(mod, alpha = 0.7)
}
# A more complicated example showing how to include predictors
# in the factor_formula
# Simulate latent count data for 500 spatial locations and 10 species
set.seed(0)
N_points <- 500
N_species <- 10
# Species-level intercepts (on the log scale)
alphas <- runif(N_species, 2, 2.25)
# Simulate a covariate and species-level responses to it
temperature <- rnorm(N_points)
betas <- runif(N_species, -0.5, 0.5)
# Simulate points uniformly over a space
lon <- runif(N_points, min = 150, max = 155)
lat <- runif(N_points, min = -20, max = -19)
# Set up spatial basis functions as a tensor product of lat and lon
sm <- mgcv::smoothCon(mgcv::te(lon, lat, k = 5),
data = data.frame(lon, lat),
knots = NULL)[[1]]
# The design matrix for this smooth is in the 'X' slot
des_mat <- sm$X
dim(des_mat)
# Function to generate a random covariance matrix where all variables
# have unit variance (i.e. diagonals are all 1)
random_Sigma = function(N){
L_Omega <- matrix(0, N, N);
L_Omega[1, 1] <- 1;
for (i in 2 : N) {
bound <- 1;
for (j in 1 : (i - 1)) {
L_Omega[i, j] <- runif(1, -sqrt(bound), sqrt(bound));
bound <- bound - L_Omega[i, j] ^ 2;
}
L_Omega[i, i] <- sqrt(bound);
}
Sigma <- L_Omega %*% t(L_Omega);
return(Sigma)
}
# Simulate a variance-covariance matrix for the correlations among
# basis coefficients
Sigma <- random_Sigma(N = NCOL(des_mat))
# Now simulate the species-level basis coefficients hierarchically, where
# spatial basis function correlations are a convex sum of a base correlation
# matrix and a species-level correlation matrix
basis_coefs <- matrix(NA, nrow = N_species, ncol = NCOL(Sigma))
base_field <- mgcv::rmvn(1, mu = rep(0, NCOL(Sigma)), V = Sigma)
for(t in 1:N_species){
corOmega <- (cov2cor(Sigma) * 0.7) +
(0.3 * cov2cor(random_Sigma(N = NCOL(des_mat))))
basis_coefs[t, ] <- mgcv::rmvn(1, mu = rep(0, NCOL(Sigma)), V = corOmega)
}
# Simulate the latent spatial processes
st_process <- do.call(rbind, lapply(seq_len(N_species), function(t){
data.frame(lat = lat,
lon = lon,
species = paste0('species_', t),
temperature = temperature,
process = alphas[t] +
betas[t] * temperature +
des_mat %*% basis_coefs[t,])
}))
# Now take noisy observations at some of the points (60)
obs_points <- sample(1:N_points, size = 60, replace = FALSE)
obs_points <- data.frame(lat = lat[obs_points],
lon = lon[obs_points],
site = 1:60)
# Keep only the process data at these points
st_process %>%
dplyr::inner_join(obs_points, by = c('lat', 'lon')) %>%
# now take noisy Poisson observations of the process
dplyr::mutate(count = rpois(NROW(.), lambda = exp(process))) %>%
dplyr::mutate(species = factor(species,
levels = paste0('species_', 1:N_species))) %>%
dplyr::group_by(lat, lon) -> dat
# View the count distributions for each species
ggplot(dat, aes(x = count)) +
geom_histogram() +
facet_wrap(~ species, scales = 'free')
ggplot(dat, aes(x = lon, y = lat, col = log(count + 1))) +
geom_point(size = 2.25) +
facet_wrap(~ species, scales = 'free') +
scale_color_viridis_c()
# Inspect default priors for a joint species model with three spatial factors
priors <- get_mvgam_priors(formula = count ~
# Environmental model includes random slopes for
# a linear effect of temperature
s(species, bs = 're', by = temperature),
# Each factor estimates a different nonlinear spatial process, using
# 'by = trend' as in other mvgam State-Space models
factor_formula = ~ gp(lon, lat, k = 6, by = trend) - 1,
n_lv = 3,
# The data and grouping variables
data = dat,
unit = site,
species = species,
# Poisson observations
family = poisson())
head(priors)
# Fit a JSDM that estimates hierarchical temperature responses
# and that uses three latent spatial factors
mod <- jsdgam(formula = count ~
# Environmental model includes random slopes for a
# linear effect of temperature
s(species, bs = 're', by = temperature),
# Each factor estimates a different nonlinear spatial process, using
# 'by = trend' as in other mvgam State-Space models
factor_formula = ~ gp(lon, lat, k = 6, by = trend) - 1,
n_lv = 3,
# Change default priors for fixed random effect variances and
# factor GP marginal deviations to standard normal
priors = c(prior(std_normal(),
class = sigma_raw),
prior(std_normal(),
class = `alpha_gp_trend(lon, lat):trendtrend1`),
prior(std_normal(),
class = `alpha_gp_trend(lon, lat):trendtrend2`),
prior(std_normal(),
class = `alpha_gp_trend(lon, lat):trendtrend3`)),
# The data and the grouping variables
data = dat,
unit = site,
species = species,
# Poisson observations
family = poisson(),
chains = 2,
silent = 2)
# Plot the implicit species-level intercept estimates
plot_predictions(mod, condition = 'species',
type = 'link')
# Plot species' hierarchical responses to temperature
plot_predictions(mod, condition = c('temperature', 'species', 'species'),
type = 'link')
# Plot posterior median estimates of the latent spatial factors
plot(mod, type = 'smooths', trend_effects = TRUE)
# Or using gratia, if you have it installed
if(requireNamespace('gratia', quietly = TRUE)){
gratia::draw(mod, trend_effects = TRUE, dist = 0)
}
# Plot species' randomized quantile residual distributions
# as a function of latitude
pp_check(mod,
type = 'resid_ribbon_grouped',
group = 'species',
x = 'lat',
ndraws = 200)
# Calculate residual spatial correlations
post_cors <- residual_cor(mod)
names(post_cors)
# Look at lower and upper credible interval estimates for
# some of the estimated correlations
post_cors$cor[1:5, 1:5]
post_cors$cor_upper[1:5, 1:5]
post_cors$cor_lower[1:5, 1:5]
# Plot of the posterior median correlations for those estimated
# to be non-zero
plot(post_cors, cluster = TRUE)
# An ordination biplot can also be constructed
# from the factor scores and their loadings
if(requireNamespace('ggrepel', quietly = TRUE)){
ordinate(mod)
}
# Posterior predictive checks and ELPD-LOO can ascertain model fit
pp_check(mod,
type = "pit_ecdf_grouped",
group = "species",
ndraws = 200)
loo(mod)
# Forecast log(counts) for entire region (site value doesn't matter as long
# as each spatial location has a different and unique site identifier);
# note this calculation takes a few minutes because of the need to calculate
# draws from the stochastic latent factors
newdata <- st_process %>%
dplyr::mutate(species = factor(species,
levels = paste0('species_',
1:N_species))) %>%
dplyr::group_by(lat, lon) %>%
dplyr::mutate(site = dplyr::cur_group_id()) %>%
dplyr::ungroup()
preds <- predict(mod, newdata = newdata)
# Plot the median log(count) predictions on a grid
newdata$log_count <- preds[,1]
ggplot(newdata, aes(x = lon, y = lat, col = log_count)) +
geom_point(size = 1.5) +
facet_wrap(~ species, scales = 'free') +
scale_color_viridis_c() +
theme_classic()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.