negbinBvs  R Documentation 
This function performs Bayesian regression modelling of overdispersed count data including variable selection via spike and slab priors. Posterior inference is based on MCMC sampling techniques.
negbinBvs( y, offset = NULL, X, model = list(), prior = list(), mcmc = list(), start = NULL, BVS = TRUE )
y 
an integer vector of count data 
offset 
an (optional) offset term; should be 
X 
a design matrix (including an intercept term) 
model 
an (optional) list specifying the structure of the model (see details) 
prior 
an (optional) list of prior settings and hyperparameters controlling the priors (see details) 
mcmc 
an (optional) list of MCMC sampling options (see details) 
start 
an (optional), numeric vector containing starting values for the
regression effects (including an intercept term); defaults to 
BVS 
if 
The method provides Bayesian variable selection in regression modelling
of overdispersed count data. The negative binomial
distribution is derived marginally from a PoissonGamma (mixture) model,
which can be interpreted as an overdispersed Poisson model with
observationspecific random intercept log γ, where
γρ \sim Γ(ρ,ρ). A hyperprior for ρ is
specified as ρ \sim Γ(c_0,C_0), see details for prior
below.
By default, variable selection is incorporated in the model based on mixture
priors with a spike and a slab component for the regression effects
β. More specifically, a Dirac spike is used, i.e. a point mass at
zero, and (by default), the slab component is specified as a scale mixture
of normal distributions, resulting in a Studentt distribution with
2psi.nu
degrees of freedom.
The MCMC sampling scheme relies on the representation of the conditional
Poisson model as a Gaussian regression model in auxiliary variables,
as described in poissonBvs
. Data augmentation is based
on the auxiliary mixture sampling algorithm of FruehwirthSchnatter
et al. (2009). For details concerning the algorithm, see
Dvorzak and Wagner (2016b), available on request from the authors.
Details for model specification (see arguments):
model
:deltafix
an indicator vector of length
ncol(X)1
specifying which regression effects are subject to selection
(i.e., 0 = subject to selection, 1 = fix in the model); defaults to a vector
of zeros.
prior
:slab
distribution of the slab component, i.e. "Student
"
(default) or "Normal
".
psi.nu
hyperparameter of the Studentt slab (used for a
"Student
" slab); defaults to 5.
m0
prior mean for the intercept parameter; defaults to 0.
M0
prior variance for the intercept parameter; defaults to 100.
aj0
a vector of prior means for the regression effects (which is encoded in a normal distribution, see notes); defaults to vector of zeros.
V
variance of the slab; defaults to 5.
w
hyperparameters of the Betaprior for the mixture weight
ω; defaults to c(wa0=1, wb0=1)
, i.e. a uniform distribution.
c0, C0
scale and rate of the gamma prior for the hyperparameter ρ; defaults to 2 and 1.
eps
tuning parameter in the MHstep to sample ρ; defaults to 0.05.
mcmc
:M
number of MCMC iterations after the burnin phase; defaults to 8000.
burnin
number of MCMC iterations discarded as burnin; defaults to 2000.
thin
thinning parameter; defaults to 1.
startsel
number of MCMC iterations drawn from the unrestricted
model (e.g., burnin/2
); defaults to 1000.
verbose
MCMC progress report in each verbose
th iteration
step; defaults to 500. If verbose=0
, no output is generated.
msave
returns additional output with variable selection details
(i.e. posterior samples for ω, δ); defaults to FALSE
.
The function returns an object of class "pogit
" with methods
print.pogit
, summary.pogit
and
plot.pogit
.
The returned object is a list containing the following elements:

a named list containing the samples from the posterior
distribution of the parameters in the negative binomial model
(see also


a list containing the data 

a list containing details on the model specification,
see details for 

see details for 

see details for 

a list containing the total runtime ( 

acceptance rate of parameter ρ 

see arguments 

a list containing starting values, see arguments 

"negbin" 

function call 
Alternatively, a Poisson model with observationspecific normal
random intercept (i.e., a Poissonlognormal mixture model) can be used
to deal with overdispersion of count data, which is provided in the
function poissonBvs
.
If prior information on the regression parameters is available, this
information is encoded in a normal distribution instead of the spike
and slab prior (consequently, BVS
is set to FALSE
).
Michaela Dvorzak <m.dvorzak@gmx.at>
Dvorzak, M. and Wagner, H. (2016b). Bayesian inference for overdispersed count data subject to underreporting  An application to norovirus illness in Germany. (Unpublished) working paper.
FruehwirthSchnatter, S., Fruehwirth, R., Held, L. and Rue, H. (2009). Improved auxiliary mixture sampling for hierarchical models of nonGaussian data. Statistics and Computing, 19, 479  492.
poissonBvs
## Not run: ## Examples below should take about 12 minutes. ##  (use simul_pois1)  data(simul_pois1) y < simul_pois1$y X < as.matrix(simul_pois1[, 1]) # Bayesian variable selection for simulated data set m1 < negbinBvs(y = y, X = X) # print results (check acceptance rate for 'rho') print(m1) # rerun with adapted tuning parameter 'eps' m2 < negbinBvs(y = y, X = X, prior = list(eps = 0.4)) # print and summarize results print(m2) summary(m2) # alternatively, compare results to overdispersed Poisson model with # normal random intercept (subject to selection), provided in 'poissonBvs' # specify observationspecific random intercept cID < seq_along(y) m3 < poissonBvs(y = y, X = X, model = list(ri = TRUE, clusterID = cID)) # print, summarize and plot results print(m3) summary(m3) # note that thetaB is not selected (!) plot(m3, burnin = FALSE, thin = FALSE) ##  (use data set "azdrg112" from package "COUNT")  if (!requireNamespace("COUNT", quietly = TRUE)){ stop("package 'COUNT' is needed for this example to work. Please install it.") } library(COUNT) # load data set 'azdrg112' # (Arizona Medicare data for DRG (Diagnostic Related Group) 112) data(azdrg112) y < as.numeric(azdrg112$los) # hospital length of stay: 153 days X < as.matrix(azdrg112[,1]) # covariates (gender, type1, age75) m4 < negbinBvs(y = y, X = X, mcmc = list(M = 4000)) # print results (check acceptance rate for 'rho') print(m4) summary(m4) plot(m4, burnin = FALSE) # adapte tuning parameter eps (and set BVS to FALSE) prior < list(eps = 0.1) m5 < negbinBvs(y = y, X = X, mcmc = list(M = 4000), prior = prior, BVS = FALSE) # print, summarize and plot results print(m5) summary(m5) plot(m5, burnin = FALSE, thin = FALSE) plot(m5, type = "acf", lag.max = 50) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.