negbinBvs | R Documentation |
This function performs Bayesian regression modelling of overdispersed count data including variable selection via spike and slab priors. Posterior inference is based on MCMC sampling techniques.
negbinBvs( y, offset = NULL, X, model = list(), prior = list(), mcmc = list(), start = NULL, BVS = TRUE )
y |
an integer vector of count data |
offset |
an (optional) offset term; should be |
X |
a design matrix (including an intercept term) |
model |
an (optional) list specifying the structure of the model (see details) |
prior |
an (optional) list of prior settings and hyper-parameters controlling the priors (see details) |
mcmc |
an (optional) list of MCMC sampling options (see details) |
start |
an (optional), numeric vector containing starting values for the
regression effects (including an intercept term); defaults to |
BVS |
if |
The method provides Bayesian variable selection in regression modelling
of overdispersed count data. The negative binomial
distribution is derived marginally from a Poisson-Gamma (mixture) model,
which can be interpreted as an overdispersed Poisson model with
observation-specific random intercept log γ, where
γ|ρ \sim Γ(ρ,ρ). A hyper-prior for ρ is
specified as ρ \sim Γ(c_0,C_0), see details for prior
below.
By default, variable selection is incorporated in the model based on mixture
priors with a spike and a slab component for the regression effects
β. More specifically, a Dirac spike is used, i.e. a point mass at
zero, and (by default), the slab component is specified as a scale mixture
of normal distributions, resulting in a Student-t distribution with
2psi.nu
degrees of freedom.
The MCMC sampling scheme relies on the representation of the conditional
Poisson model as a Gaussian regression model in auxiliary variables,
as described in poissonBvs
. Data augmentation is based
on the auxiliary mixture sampling algorithm of Fruehwirth-Schnatter
et al. (2009). For details concerning the algorithm, see
Dvorzak and Wagner (2016b), available on request from the authors.
Details for model specification (see arguments):
model
:deltafix
an indicator vector of length
ncol(X)-1
specifying which regression effects are subject to selection
(i.e., 0 = subject to selection, 1 = fix in the model); defaults to a vector
of zeros.
prior
:slab
distribution of the slab component, i.e. "Student
"
(default) or "Normal
".
psi.nu
hyper-parameter of the Student-t slab (used for a
"Student
" slab); defaults to 5.
m0
prior mean for the intercept parameter; defaults to 0.
M0
prior variance for the intercept parameter; defaults to 100.
aj0
a vector of prior means for the regression effects (which is encoded in a normal distribution, see notes); defaults to vector of zeros.
V
variance of the slab; defaults to 5.
w
hyper-parameters of the Beta-prior for the mixture weight
ω; defaults to c(wa0=1, wb0=1)
, i.e. a uniform distribution.
c0, C0
scale and rate of the gamma prior for the hyper-parameter ρ; defaults to 2 and 1.
eps
tuning parameter in the MH-step to sample ρ; defaults to 0.05.
mcmc
:M
number of MCMC iterations after the burn-in phase; defaults to 8000.
burnin
number of MCMC iterations discarded as burn-in; defaults to 2000.
thin
thinning parameter; defaults to 1.
startsel
number of MCMC iterations drawn from the unrestricted
model (e.g., burnin/2
); defaults to 1000.
verbose
MCMC progress report in each verbose
-th iteration
step; defaults to 500. If verbose=0
, no output is generated.
msave
returns additional output with variable selection details
(i.e. posterior samples for ω, δ); defaults to FALSE
.
The function returns an object of class "pogit
" with methods
print.pogit
, summary.pogit
and
plot.pogit
.
The returned object is a list containing the following elements:
|
a named list containing the samples from the posterior
distribution of the parameters in the negative binomial model
(see also
|
|
a list containing the data |
|
a list containing details on the model specification,
see details for |
|
see details for |
|
see details for |
|
a list containing the total runtime ( |
|
acceptance rate of parameter ρ |
|
see arguments |
|
a list containing starting values, see arguments |
|
"negbin" |
|
function call |
Alternatively, a Poisson model with observation-specific normal
random intercept (i.e., a Poisson-log-normal mixture model) can be used
to deal with overdispersion of count data, which is provided in the
function poissonBvs
.
If prior information on the regression parameters is available, this
information is encoded in a normal distribution instead of the spike
and slab prior (consequently, BVS
is set to FALSE
).
Michaela Dvorzak <m.dvorzak@gmx.at>
Dvorzak, M. and Wagner, H. (2016b). Bayesian inference for overdispersed count data subject to underreporting - An application to norovirus illness in Germany. (Unpublished) working paper.
Fruehwirth-Schnatter, S., Fruehwirth, R., Held, L. and Rue, H. (2009). Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data. Statistics and Computing, 19, 479 - 492.
poissonBvs
## Not run: ## Examples below should take about 1-2 minutes. ## ------ (use simul_pois1) ------ data(simul_pois1) y <- simul_pois1$y X <- as.matrix(simul_pois1[, -1]) # Bayesian variable selection for simulated data set m1 <- negbinBvs(y = y, X = X) # print results (check acceptance rate for 'rho') print(m1) # re-run with adapted tuning parameter 'eps' m2 <- negbinBvs(y = y, X = X, prior = list(eps = 0.4)) # print and summarize results print(m2) summary(m2) # alternatively, compare results to overdispersed Poisson model with # normal random intercept (subject to selection), provided in 'poissonBvs' # specify observation-specific random intercept cID <- seq_along(y) m3 <- poissonBvs(y = y, X = X, model = list(ri = TRUE, clusterID = cID)) # print, summarize and plot results print(m3) summary(m3) # note that thetaB is not selected (!) plot(m3, burnin = FALSE, thin = FALSE) ## ------ (use data set "azdrg112" from package "COUNT") ------ if (!requireNamespace("COUNT", quietly = TRUE)){ stop("package 'COUNT' is needed for this example to work. Please install it.") } library(COUNT) # load data set 'azdrg112' # (Arizona Medicare data for DRG (Diagnostic Related Group) 112) data(azdrg112) y <- as.numeric(azdrg112$los) # hospital length of stay: 1-53 days X <- as.matrix(azdrg112[,-1]) # covariates (gender, type1, age75) m4 <- negbinBvs(y = y, X = X, mcmc = list(M = 4000)) # print results (check acceptance rate for 'rho') print(m4) summary(m4) plot(m4, burnin = FALSE) # adapte tuning parameter eps (and set BVS to FALSE) prior <- list(eps = 0.1) m5 <- negbinBvs(y = y, X = X, mcmc = list(M = 4000), prior = prior, BVS = FALSE) # print, summarize and plot results print(m5) summary(m5) plot(m5, burnin = FALSE, thin = FALSE) plot(m5, type = "acf", lag.max = 50) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.