Description Details Warnings Author(s) References See Also Examples

This help file provides more information regarding the implementation of the stochastic search variable selection (SSVS, George and McCulloch, 1993) as implemented in the boral package.

Stochastic search variable selection (SSVS, George and McCulloch, 1993) is a approach for model selection, which is applicable specifically to the Bayesian MCMC framework. As of boral version 1.5, SSVS is implemented in two ways.

**SSVS on coefficients in X:**
SSVS is implemented on the column-specific coefficients

*ρ(β) = I_{β = 1}\times\mathcal{N}(0,σ^2) + (1-I_{β = 1})\times \mathcal{N}(0,g*σ^2),*

where *σ^2* is determined by `prior.control$hypparams[3]`

, *g* is determined by `ssvs.g`

, and *I_{β = 1} = P(β = 1)* is an indicator function representing whether coefficient is included in the model. It is given a Bernoulli prior with probability of inclusion 0.5. After fitting, the posterior probability of *β* being included in the model is returned based on posterior mean of the indicator function *I_{β = 1}*. Note this is NOT the same as a *p*-value seen in maximum likelihood estimation: a *p*-value provides an indication of how much evidence there is against the null hypothesis of *β = 0*, while the posterior probability provides a measure of how likely it is for *β \ne 0* given the data.

SSVS can be applied at a grouped or individual coefficient level, and this is governed by

`prior.control$ssvs.index`

:

For elements of

`ssvs.index`

equal to -1, SSVS is not applied on the corresponding covariate of the`X`

.For elements equal to 0, SSVS is applied to each individual coefficients of the corresponding covariate in

`X`

. That is, the fitted model will return posterior probabilities for this covariate, one for each column of`y`

.For elements taking positive integers 1,2,..., SSVS is applied to each group of coefficients of the corresponding covariate in

`X`

. That is, the fitted model will return a single posterior probability for this covariate, indicating whether this covariate should be included for all columns of`y`

; see O'Hara and Sillanpaa (2009) and Tenan et al. (2014) among many others for an discussion of Bayesian variable selection methods.

Note the last application of SSVS allows multiple covariates to be selected *simultaneously*. For example, suppose `X`

consists of five columns: the first two columns are environmental covariates, while the last three correspond to quadratic terms of the two covariates as well as their interaction. If we want to "test" whether any quadratic terms are required, then we can set

`prior.control$ssvs.index = c(-1,-1,1,1,1)`

, so a single posterior probability of inclusion is returned for the last three columns of `X`

.

Finally, note that summaries such as posterior medians and HPD intervals of the coefficients, as well as performing residual analysis, from a fitted model that has implemented SSVS may be problematic because the posterior distribution is by definition multi-modal. It may be advisable instead to separate out their application of SSVS and posterior inference.

**SSVS on trait coefficients:**
If traits are included in boral, thereby leading to a fourth corner model (see `about.traits`

for more details on this type of model), SSVS can also be performed on the associated trait coefficients. That is, in such model we have

*β_{0j} \sim N(κ_{01} + \bm{traits}^\top_j\bm{κ}_1, σ^2_1)*

for the column-specific intercepts, and

*β_{jk} \sim N(κ_{0k} + \bm{traits}^\top_j\bm{κ}_k, σ^2_k)*

for *k = 1,…,d* where `d = ncol(X)`

. Then if the a particular index in the argument

`prior.control$ssvs.traitsindex`

is set to 0, SSVS is performed on the corresponding element in *\bm{κ}_1* or *\bm{κ}_k*. For example, suppose `which.traits[[2]] = c(2,3)`

, meaning that the *β_{j1}*'s are drawn from a normal distribution with mean depending only on the second and third columns of `traits`

. Then

`prior.control$ssvs.traitsindex[[2]] = c(0,1)`

, then a spike-and-slab prior is placed on the first coefficent in *\bm{κ}_2*, while the second coefficient is assigned the “standard" prior governed by the `prior.control$hypparams`

. That is, SSVS is performed on the first but not the second coefficient in *\bm{κ}_2*.

Please keep in mind that because boral allows the user to manually decide which traits drive which covariates in `X`

, then care must be taken when setting up both `which.traits`

and

`prior.control$ssvs.traitsindex`

. That is, when supplied then both objects should be lists of have the same length, and the length of the corresponding vectors comprising each element in the two lists should match as well e.g., `which.traits[[2]]`

and

`prior.control$ssvs.traitsindex[[2]]`

should be of the same length.

Summaries of the coefficients such as posterior medians and HPD intervals may also be problematic when SSVS is being used, since the posterior distribution will be multi-modal.

If

`save.model = TRUE`

, the raw jags model is also returned. This can be quite very memory-consuming, since it indirectly saves all the MCMC samples.

Francis K.C. Hui <[email protected]>, with contributions from Wade Blanchard <[email protected]>

Maintainer: Francis Hui <[email protected]>

George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 85, 398-409.

O'Hara, B., and Sillianpaa, M.J. (2009). A Review of Bayesian Variable Selection Methods: What, How and Which. Bayesian Analysis, 4, 85-118.

Tenan, S., et al. (2014). Bayesian model selection: The steepest mountain to climb. Ecological Modelling, 283, 62-69.

`boral`

for the main boral fitting function which implementing SSVS, and `about.traits`

for how fourth corner models work before applying SSVS to them.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | ```
library(mvabund) ## Load a dataset from the mvabund package
data(spider)
y <- spider$abun
X <- scale(spider$x)
n <- nrow(y)
p <- ncol(y)
## NOTE: The two examples below and taken directly from the boral help file
example_mcmc_control <- list(n.burnin = 10, n.iteration = 100,
n.thin = 1)
testpath <- file.path(tempdir(), "jagsboralmodel.txt")
## Not run:
## Example 3a - Extend example 2 to demonstrate grouped covariate selection
## on the last three covariates.
example_prior_control <- list(type = c("normal","normal","normal","uniform"),
ssvs.index = c(-1,-1,-1,1,2,3))
spiderfit_nb2 <- boral(y, X = X, family = "negative.binomial",
mcmc.control = example_mcmc_control, prior.control = example_prior_control,
model.name = testpath)
summary(spiderfit_nb2)
## Example 3b - Extend example 2 to demonstrate individual covariate selection
## on the last three covariates.
example_prior_control <- list(type = c("normal","normal","normal","uniform"),
ssvs.index = c(-1,-1,-1,0,0,0))
spiderfit_nb3 <- boral(y, X = X, family = "negative.binomial",
mcmc.control = example_mcmc_control, prior.control = example_prior_control,
model.name = testpath)
summary(spiderfit_nb3)
## Example 5a - model fitted to count data, no site effects, and
## two latent variables, plus traits included to explain environmental responses
data(antTraits)
y <- antTraits$abun
X <- as.matrix(scale(antTraits$env))
## Include only traits 1, 2, and 5
traits <- as.matrix(antTraits$traits[,c(1,2,5)])
example_which_traits <- vector("list",ncol(X)+1)
for(i in 1:length(example_which_traits))
example_which_traits[[i]] <- 1:ncol(traits)
## Just for fun, the regression coefficients for the second column of X,
## corresponding to the third element in the list example_which_traits,
## will be estimated separately and not regressed against traits.
example_which_traits[[3]] <- 0
fit_traits <- boral(y, X = X, traits = traits,
which.traits = example_which_traits, family = "negative.binomial",
mcmc.control = example_mcmc_control, model.name = testpath,
save.model = TRUE)
summary(fit_traits)
## Example 5b - perform selection on trait coefficients
ssvs_traitsindex <- vector("list",ncol(X)+1)
for(i in 1:length(ssvs_traitsindex))
ssvs_traitsindex[[i]] <- rep(0,ncol(traits))
ssvs_traitsindex[[3]] <- -1
fit_traits <- boral(y, X = X, traits = traits, which.traits = example_which_traits,
family = "negative.binomial", mcmc.control = example_mcmc_control,
save.model = TRUE, prior.control = list(ssvs.traitsindex = ssvs_traitsindex),
model.name = testpath)
summary(fit_traits)
## End(Not run)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.