# negbinBvs: Bayesian variable selection for the negative binomial model In pogit: Bayesian Variable Selection for a Poisson-Logistic Model

 negbinBvs R Documentation

## Bayesian variable selection for the negative binomial model

### Description

This function performs Bayesian regression modelling of overdispersed count data including variable selection via spike and slab priors. Posterior inference is based on MCMC sampling techniques.

### Usage

```negbinBvs(
y,
offset = NULL,
X,
model = list(),
prior = list(),
mcmc = list(),
start = NULL,
BVS = TRUE
)
```

### Arguments

 `y` an integer vector of count data `offset` an (optional) offset term; should be `NULL` or an integer vector of length equal to the number of counts. `X` a design matrix (including an intercept term) `model` an (optional) list specifying the structure of the model (see details) `prior` an (optional) list of prior settings and hyper-parameters controlling the priors (see details) `mcmc` an (optional) list of MCMC sampling options (see details) `start` an (optional), numeric vector containing starting values for the regression effects (including an intercept term); defaults to `NULL` (i.e. a vector of zeros is used). `BVS` if `TRUE` (default), Bayesian variable selection is performed to identify regressors with a non-zero effect; otherwise, an unrestricted model is estimated (without variable selection).

### Details

The method provides Bayesian variable selection in regression modelling of overdispersed count data. The negative binomial distribution is derived marginally from a Poisson-Gamma (mixture) model, which can be interpreted as an overdispersed Poisson model with observation-specific random intercept log γ, where γ|ρ \sim Γ(ρ,ρ). A hyper-prior for ρ is specified as ρ \sim Γ(c_0,C_0), see details for `prior` below. By default, variable selection is incorporated in the model based on mixture priors with a spike and a slab component for the regression effects β. More specifically, a Dirac spike is used, i.e. a point mass at zero, and (by default), the slab component is specified as a scale mixture of normal distributions, resulting in a Student-t distribution with 2`psi.nu` degrees of freedom.

The MCMC sampling scheme relies on the representation of the conditional Poisson model as a Gaussian regression model in auxiliary variables, as described in `poissonBvs`. Data augmentation is based on the auxiliary mixture sampling algorithm of Fruehwirth-Schnatter et al. (2009). For details concerning the algorithm, see Dvorzak and Wagner (2016b), available on request from the authors.

Details for model specification (see arguments):

`model`:
`deltafix`

an indicator vector of length `ncol(X)-1` specifying which regression effects are subject to selection (i.e., 0 = subject to selection, 1 = fix in the model); defaults to a vector of zeros.

`prior`:
`slab`

distribution of the slab component, i.e. "`Student`" (default) or "`Normal`".

`psi.nu`

hyper-parameter of the Student-t slab (used for a "`Student`" slab); defaults to 5.

`m0`

prior mean for the intercept parameter; defaults to 0.

`M0`

prior variance for the intercept parameter; defaults to 100.

`aj0`

a vector of prior means for the regression effects (which is encoded in a normal distribution, see notes); defaults to vector of zeros.

`V`

variance of the slab; defaults to 5.

`w`

hyper-parameters of the Beta-prior for the mixture weight ω; defaults to `c(wa0=1, wb0=1)`, i.e. a uniform distribution.

`c0, C0`

scale and rate of the gamma prior for the hyper-parameter ρ; defaults to 2 and 1.

`eps`

tuning parameter in the MH-step to sample ρ; defaults to 0.05.

`mcmc`:
`M`

number of MCMC iterations after the burn-in phase; defaults to 8000.

`burnin`

number of MCMC iterations discarded as burn-in; defaults to 2000.

`thin`

thinning parameter; defaults to 1.

`startsel`

number of MCMC iterations drawn from the unrestricted model (e.g., `burnin/2`); defaults to 1000.

`verbose`

MCMC progress report in each `verbose`-th iteration step; defaults to 500. If `verbose=0`, no output is generated.

`msave`

returns additional output with variable selection details (i.e. posterior samples for ω, δ); defaults to `FALSE`.

### Value

The function returns an object of class "`pogit`" with methods `print.pogit`, `summary.pogit` and `plot.pogit`.

The returned object is a list containing the following elements:

 `samplesNB` a named list containing the samples from the posterior distribution of the parameters in the negative binomial model (see also `msave`): `beta, rho`regression coefficients β and ρ `pdeltaBeta`P(δ_β=1) `psiBeta`scale parameter ψ_β of the slab component `data` a list containing the data `y`, `offset` and `X` `model.nb` a list containing details on the model specification, see details for `model` `mcmc` see details for `mcmc` `prior.nb` see details for `prior` `dur` a list containing the total runtime (`total`) and the runtime after burn-in (`durM`), in seconds `acc.rho` acceptance rate of parameter ρ `BVS` see arguments `start` a list containing starting values, see arguments `family` "negbin" `call` function call

### Note

Alternatively, a Poisson model with observation-specific normal random intercept (i.e., a Poisson-log-normal mixture model) can be used to deal with overdispersion of count data, which is provided in the function `poissonBvs`.

If prior information on the regression parameters is available, this information is encoded in a normal distribution instead of the spike and slab prior (consequently, `BVS` is set to `FALSE`).

### Author(s)

Michaela Dvorzak <m.dvorzak@gmx.at>

### References

Dvorzak, M. and Wagner, H. (2016b). Bayesian inference for overdispersed count data subject to underreporting - An application to norovirus illness in Germany. (Unpublished) working paper.

Fruehwirth-Schnatter, S., Fruehwirth, R., Held, L. and Rue, H. (2009). Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data. Statistics and Computing, 19, 479 - 492.

`poissonBvs`

### Examples

```## Not run:
## Examples below should take about 1-2 minutes.

## ------ (use simul_pois1) ------
data(simul_pois1)
y <- simul_pois1\$y
X <- as.matrix(simul_pois1[, -1])

# Bayesian variable selection for simulated data set
m1 <- negbinBvs(y = y, X = X)

# print results (check acceptance rate for 'rho')
print(m1)

# re-run with adapted tuning parameter 'eps'
m2 <- negbinBvs(y = y, X = X, prior = list(eps = 0.4))

# print and summarize results
print(m2)
summary(m2)

# alternatively, compare results to overdispersed Poisson model with
# normal random intercept (subject to selection), provided in 'poissonBvs'

# specify observation-specific random intercept
cID <- seq_along(y)
m3  <- poissonBvs(y = y, X = X, model = list(ri = TRUE, clusterID = cID))

# print, summarize and plot results
print(m3)
summary(m3)
# note that thetaB is not selected (!)

plot(m3, burnin = FALSE, thin = FALSE)

## ------ (use data set "azdrg112" from package "COUNT") ------

if (!requireNamespace("COUNT", quietly = TRUE)){
stop("package 'COUNT' is needed for this example to work.
}

library(COUNT)
# (Arizona Medicare data for DRG (Diagnostic Related Group) 112)
data(azdrg112)

y <- as.numeric(azdrg112\$los) # hospital length of stay: 1-53 days
X <- as.matrix(azdrg112[,-1]) # covariates (gender, type1, age75)
m4 <- negbinBvs(y = y, X = X, mcmc = list(M = 4000))

# print results (check acceptance rate for 'rho')
print(m4)
summary(m4)
plot(m4, burnin = FALSE)

# adapte tuning parameter eps (and set BVS to FALSE)
prior <- list(eps = 0.1)
m5 <- negbinBvs(y = y, X = X, mcmc = list(M = 4000), prior = prior,
BVS = FALSE)

# print, summarize and plot results
print(m5)
summary(m5)
plot(m5, burnin = FALSE, thin = FALSE)
plot(m5, type = "acf", lag.max = 50)

## End(Not run)
```

pogit documentation built on May 25, 2022, 5:05 p.m.