negbinBvs: Bayesian variable selection for the negative binomial model In pogit: Bayesian Variable Selection for a Poisson-Logistic Model

Description

This function performs Bayesian regression modelling of overdispersed count data including variable selection via spike and slab priors. Posterior inference is based on MCMC sampling techniques.

Usage

 ```1 2``` ```negbinBvs(y, offset = NULL, X, model = list(), prior = list(), mcmc = list(), start = NULL, BVS = TRUE) ```

Arguments

 `y` an integer vector of count data `offset` an (optional) offset term; should be `NULL` or an integer vector of length equal to the number of counts. `X` a design matrix (including an intercept term) `model` an (optional) list specifying the structure of the model (see details) `prior` an (optional) list of prior settings and hyper-parameters controlling the priors (see details) `mcmc` an (optional) list of MCMC sampling options (see details) `start` an (optional), numeric vector containing starting values for the regression effects (including an intercept term); defaults to `NULL` (i.e. a vector of zeros is used). `BVS` if `TRUE` (default), Bayesian variable selection is performed to identify regressors with a non-zero effect; otherwise, an unrestricted model is estimated (without variable selection).

Details

The method provides Bayesian variable selection in regression modelling of overdispersed count data. The negative binomial distribution is derived marginally from a Poisson-Gamma (mixture) model, which can be interpreted as an overdispersed Poisson model with observation-specific random intercept log γ, where γ|ρ \sim Γ(ρ,ρ). A hyper-prior for ρ is specified as ρ \sim Γ(c_0,C_0), see details for `prior` below. By default, variable selection is incorporated in the model based on mixture priors with a spike and a slab component for the regression effects β. More specifically, a Dirac spike is used, i.e. a point mass at zero, and (by default), the slab component is specified as a scale mixture of normal distributions, resulting in a Student-t distribution with 2`psi.nu` degrees of freedom.

The MCMC sampling scheme relies on the representation of the conditional Poisson model as a Gaussian regression model in auxiliary variables, as described in `poissonBvs`. Data augmentation is based on the auxiliary mixture sampling algorithm of Fruehwirth-Schnatter et al. (2009). For details concerning the algorithm, see Dvorzak and Wagner (2016b), available on request from the authors.

Details for the model specification (see arguments):

`model`

A list:

`deltafix`

an indicator vector of length `ncol(X)-1` specifying which regression effects are subject to selection (i.e., 0 = subject to selection, 1 = fix in the model); defaults to a vector of zeros.

`prior`

A list:

`slab`

distribution of the slab component, i.e. "`Student`" (default) or "`Normal`".

`psi.nu`

hyper-parameter of the Student-t slab (used for a "`Student`" slab); defaults to 5.

`m0`

prior mean for the intercept parameter; defaults to 0.

`M0`

prior variance for the intercept parameter; defaults to 100.

`aj0`

a vector of prior means for the regression effects (which is encoded in a normal distribution, see notes); defaults to vector of zeros.

`V`

variance of the slab; defaults to 5.

`w`

hyper-parameters of the Beta-prior for the mixture weight ω; defaults to `c(wa0=1, wb0=1)`, i.e. a uniform distribution.

`c0, C0`

scale and rate of the gamma prior for the hyper-parameter ρ; defaults to 2 and 1.

`eps`

tuning parameter in the MH-step to sample ρ; defaults to 0.05.

`mcmc`

A list:

`M`

number of MCMC iterations after the burn-in phase; defaults to 8000.

`burnin`

number of MCMC iterations discarded as burn-in; defaults to 2000.

`thin`

thinning parameter; defaults to 1.

`startsel`

number of MCMC iterations drawn from the unrestricted model (e.g., `burnin/2`); defaults to 1000.

`verbose`

MCMC progress report in each `verbose`-th iteration step; defaults to 500. If `verbose=0`, no output is generated.

`msave`

returns additional output with variable selection details (i.e. posterior samples for ω, δ); defaults to `FALSE`.

Value

The function returns an object of class "`pogit`" with methods `print.pogit`, `summary.pogit` and `plot.pogit`.

The returned object is a list containing the following elements:

 `samplesNB` a named list containing the samples from the posterior distribution of the parameters in the negative binomial model (see also `msave`): `beta, rho`regression coefficients β and ρ `pdeltaBeta`P(δ_β=1) `psiBeta`scale parameter ψ_β of the slab component `data` a list containing the data `y`, `offset` and `X` `model.nb` a list containing details on the model specification, see details for `model` `mcmc` see details for `mcmc` `prior.nb` see details for `prior` `dur` a list containing the total runtime (`total`) and the runtime after burn-in (`durM`), in seconds `acc.rho` acceptance rate of parameter ρ `BVS` see arguments `start` a list containing starting values, see arguments `family` "negbin" `call` function call

Note

Alternatively, a Poisson model with observation-specific normal random intercept (i.e., a Poisson-log-normal mixture model) can be used to deal with overdispersion of count data, which is provided in the function `poissonBvs`.

If prior information on the regression parameters is available, this information is encoded in a normal distribution instead of the spike and slab prior (consequently, `BVS` is set to `FALSE`).

Author(s)

Michaela Dvorzak <m.dvorzak@gmx.at>

References

Dvorzak, M. and Wagner, H. (2016b). Bayesian inference for overdispersed count data subject to underreporting - An application to norovirus illness in Germany. (Unpublished) working paper.

Fruehwirth-Schnatter, S., Fruehwirth, R., Held, L. and Rue, H. (2009). Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data. Statistics and Computing, 19, 479 - 492.

`poissonBvs`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69``` ```## Not run: ## Examples below should take about 1-2 minutes. ## ------ (use simul_pois1) ------ data(simul_pois1) y <- simul_pois1\$y X <- as.matrix(simul_pois1[, -1]) # Bayesian variable selection for simulated data set m1 <- negbinBvs(y = y, X = X) # print results (check acceptance rate for 'rho') print(m1) # re-run with adapted tuning parameter 'eps' m2 <- negbinBvs(y = y, X = X, prior = list(eps = 0.4)) # print and summarize results print(m2) summary(m2) # alternatively, compare results to overdispersed Poisson model with # normal random intercept (subject to selection), provided in 'poissonBvs' # specify observation-specific random intercept cID <- seq_along(y) m3 <- poissonBvs(y = y, X = X, model = list(ri = TRUE, clusterID = cID)) # print, summarize and plot results print(m3) summary(m3) # note that thetaB is not selected (!) plot(m3, burnin = FALSE, thin = FALSE) ## ------ (use data set "azdrg112" from package "COUNT") ------ if (!requireNamespace("COUNT", quietly = TRUE)){ stop("package 'COUNT' is needed for this example to work. Please install it.") } library(COUNT) # load data set 'azdrg112' # (Arizona Medicare data for DRG (Diagnostic Related Group) 112) data(azdrg112) y <- as.numeric(azdrg112\$los) # hospital length of stay: 1-53 days X <- as.matrix(azdrg112[,-1]) # covariates (gender, type1, age75) m4 <- negbinBvs(y = y, X = X, mcmc = list(M = 4000)) # print results (check acceptance rate for 'rho') print(m4) summary(m4) plot(m4, burnin = FALSE) # adapte tuning parameter eps (and set BVS to FALSE) prior <- list(eps = 0.1) m5 <- negbinBvs(y = y, X = X, mcmc = list(M = 4000), prior = prior, BVS = FALSE) # print, summarize and plot results print(m5) summary(m5) plot(m5, burnin = FALSE, thin = FALSE) plot(m5, type = "acf", lag.max = 50) ## End(Not run) ```