about.ssvs: Stochastic search variable selection (SSVS) in boral

Description Details Warnings Author(s) References See Also Examples

Description

This help file provides more information regarding the implementation of the stochastic search variable selection (SSVS, George and McCulloch, 1993) as implemented in the boral package.

Details

Stochastic search variable selection (SSVS, George and McCulloch, 1993) is a approach for model selection, which is applicable specifically to the Bayesian MCMC framework. As of boral version 1.5, SSVS is implemented in two ways.

SSVS on coefficients in X: SSVS is implemented on the column-specific coefficients \bm{β}_j. Basically, SSVS works by placing a spike-and-slab priors on these coefficients, such that the spike is a narrow normal distribution concentrated around zero and the spike is a normal distribution with a large variance.

ρ(β) = I_{β = 1}\times\mathcal{N}(0,σ^2) + (1-I_{β = 1})\times \mathcal{N}(0,g*σ^2),

where σ^2 is determined by prior.control$hypparams[3], g is determined by ssvs.g, and I_{β = 1} = P(β = 1) is an indicator function representing whether coefficient is included in the model. It is given a Bernoulli prior with probability of inclusion 0.5. After fitting, the posterior probability of β being included in the model is returned based on posterior mean of the indicator function I_{β = 1}. Note this is NOT the same as a p-value seen in maximum likelihood estimation: a p-value provides an indication of how much evidence there is against the null hypothesis of β = 0, while the posterior probability provides a measure of how likely it is for β \ne 0 given the data.

SSVS can be applied at a grouped or individual coefficient level, and this is governed by
prior.control$ssvs.index:

Note the last application of SSVS allows multiple covariates to be selected simultaneously. For example, suppose X consists of five columns: the first two columns are environmental covariates, while the last three correspond to quadratic terms of the two covariates as well as their interaction. If we want to "test" whether any quadratic terms are required, then we can set
prior.control$ssvs.index = c(-1,-1,1,1,1), so a single posterior probability of inclusion is returned for the last three columns of X.

Finally, note that summaries such as posterior medians and HPD intervals of the coefficients, as well as performing residual analysis, from a fitted model that has implemented SSVS may be problematic because the posterior distribution is by definition multi-modal. It may be advisable instead to separate out their application of SSVS and posterior inference.

SSVS on trait coefficients: If traits are included in boral, thereby leading to a fourth corner model (see about.traits for more details on this type of model), SSVS can also be performed on the associated trait coefficients. That is, in such model we have

β_{0j} \sim N(κ_{01} + \bm{traits}^\top_j\bm{κ}_1, σ^2_1)

for the column-specific intercepts, and

β_{jk} \sim N(κ_{0k} + \bm{traits}^\top_j\bm{κ}_k, σ^2_k)

for k = 1,…,d where d = ncol(X). Then if the a particular index in the argument
prior.control$ssvs.traitsindex is set to 0, SSVS is performed on the corresponding element in \bm{κ}_1 or \bm{κ}_k. For example, suppose which.traits[[2]] = c(2,3), meaning that the β_{j1}'s are drawn from a normal distribution with mean depending only on the second and third columns of traits. Then
prior.control$ssvs.traitsindex[[2]] = c(0,1), then a spike-and-slab prior is placed on the first coefficent in \bm{κ}_2, while the second coefficient is assigned the “standard" prior governed by the prior.control$hypparams. That is, SSVS is performed on the first but not the second coefficient in \bm{κ}_2.

Please keep in mind that because boral allows the user to manually decide which traits drive which covariates in X, then care must be taken when setting up both which.traits and
prior.control$ssvs.traitsindex. That is, when supplied then both objects should be lists of have the same length, and the length of the corresponding vectors comprising each element in the two lists should match as well e.g., which.traits[[2]] and
prior.control$ssvs.traitsindex[[2]] should be of the same length.

Warnings

Author(s)

Francis K.C. Hui <[email protected]>, with contributions from Wade Blanchard <[email protected]>

Maintainer: Francis Hui <[email protected]>

References

See Also

boral for the main boral fitting function which implementing SSVS, and about.traits for how fourth corner models work before applying SSVS to them.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
library(mvabund) ## Load a dataset from the mvabund package
data(spider)
y <- spider$abun
X <- scale(spider$x)
n <- nrow(y)
p <- ncol(y)

## NOTE: The two examples below and taken directly from the boral help file

example_mcmc_control <- list(n.burnin = 10, n.iteration = 100, 
     n.thin = 1)

testpath <- file.path(tempdir(), "jagsboralmodel.txt")


## Not run: 
## Example 3a - Extend example 2 to demonstrate grouped covariate selection
## on the last three covariates. 
example_prior_control <- list(type = c("normal","normal","normal","uniform"), 
     ssvs.index = c(-1,-1,-1,1,2,3))
spiderfit_nb2 <- boral(y, X = X, family = "negative.binomial", 
    mcmc.control = example_mcmc_control, prior.control = example_prior_control,
    model.name = testpath)
     
summary(spiderfit_nb2) 


## Example 3b - Extend example 2 to demonstrate individual covariate selection
## on the last three covariates. 
example_prior_control <- list(type = c("normal","normal","normal","uniform"), 
     ssvs.index = c(-1,-1,-1,0,0,0))
spiderfit_nb3 <- boral(y, X = X, family = "negative.binomial", 
    mcmc.control = example_mcmc_control, prior.control = example_prior_control,
    model.name = testpath)
summary(spiderfit_nb3) 


## Example 5a - model fitted to count data, no site effects, and
## two latent variables, plus traits included to explain environmental responses
data(antTraits)
y <- antTraits$abun
X <- as.matrix(scale(antTraits$env))
## Include only traits 1, 2, and 5
traits <- as.matrix(antTraits$traits[,c(1,2,5)])
example_which_traits <- vector("list",ncol(X)+1)
for(i in 1:length(example_which_traits)) 
     example_which_traits[[i]] <- 1:ncol(traits)
## Just for fun, the regression coefficients for the second column of X,
## corresponding to the third element in the list example_which_traits,
## will be estimated separately and not regressed against traits.
example_which_traits[[3]] <- 0

fit_traits <- boral(y, X = X, traits = traits, 
    which.traits = example_which_traits, family = "negative.binomial", 
    mcmc.control = example_mcmc_control, model.name = testpath,
    save.model = TRUE)

summary(fit_traits)


## Example 5b - perform selection on trait coefficients
ssvs_traitsindex <- vector("list",ncol(X)+1)
for(i in 1:length(ssvs_traitsindex)) 
     ssvs_traitsindex[[i]] <- rep(0,ncol(traits))
ssvs_traitsindex[[3]] <- -1
fit_traits <- boral(y, X = X, traits = traits, which.traits = example_which_traits, 
    family = "negative.binomial", mcmc.control = example_mcmc_control, 
    save.model = TRUE, prior.control = list(ssvs.traitsindex = ssvs_traitsindex),
    model.name = testpath)

summary(fit_traits)

## End(Not run)

boral documentation built on Jan. 29, 2020, 1:06 a.m.