poissonBvs  R Documentation 
This function performs Bayesian variable selection for Poisson regression models via spike and slab priors. A cluster (or observation) specific random intercept can be included in the model to account for withincluster dependence (or overdispersion) with variance selection of the random intercept. For posterior inference, a MCMC sampling scheme is used which relies on data augmentation and involves only Gibbs sampling steps.
poissonBvs( y, offset = NULL, X, model = list(), mcmc = list(), prior = list(), start = NULL, BVS = TRUE )
y 
an integer vector of count data 
offset 
an (optional) offset term; should be 
X 
a design matrix (including an intercept term) 
model 
an (optional) list specifying the structure of the model (see details) 
mcmc 
an (optional) list of MCMC sampling options (see details) 
prior 
an (optional) list of prior settings and hyperparameters controlling the priors (see details) 
start 
an (optional), numeric vector containing starting values for the
regression effects (including an intercept term); defaults to 
BVS 
if 
The method provides a Bayesian framework for variable selection in regression
modelling of count data using mixture priors with a spike and a slab
component to identify regressors with a nonzero effect. More specifically, a
Dirac spike is used, i.e. a point mass at zero, and (by default), the slab
component is specified as a scale mixture of normal distributions, resulting
in a Studentt distribution with 2psi.nu
degrees of freedom.
In the more general random intercept model, variance selection of the random
intercept is based on the noncentered parameterization of the model, where
the signed standard deviation θ_β of the random intercept term
appears as a further regression effect in the model equation. For further
details, see Wagner and Duller (2012).
The implementation of Bayesian variable selection further relies on the
representation of the Poisson model as a Gaussian regression model in
auxiliary variables. Data augmentation is based on the auxiliary mixture
sampling algorithm of FruehwirthSchnatter et al. (2009), where the
interarrival times of an assumed Poisson process are introduced as latent
variables. The error distribution, a negative logGamma distribution,
in the auxiliary model is approximated by a finite mixture of normal
distributions where the mixture parameters of the matlab package
bayesf
, Version 2.0 of FruehwirthSchnatter (2007) are used.
See FruehwirthSchnatter et al. (2009) for details.
For details concerning the sampling algorithm, see Dvorzak and Wagner (2016) and Wagner and Duller (2012).
Details for model specification (see arguments):
model
:deltafix
an indicator vector of length
ncol(X)1
specifying which regression effects are subject to selection
(i.e., 0 = subject to selection, 1 = fix in the model); defaults to a vector
of zeros.
gammafix
an indicator for variance selection of
the random intercept term (i.e., 0 = with variance selection (default), 1 = no
variance selection); only used if a random intercept is includued in the model
(see ri
).
ri
logical. If TRUE
, a cluster (or observation) specific
random intercept is included in the model; defaults to FALSE
.
clusterID
a numeric vector of length equal to the number
of observations containing the cluster ID c = 1,...,C for each observation
(required if ri=TRUE
). Note that seq_along(y)
specifies an
overdispersed Poisson model with observationspecific (normal) random intercept
(see note).
prior
:slab
distribution of the slab component, i.e. "Student
"
(default) or "Normal
".
psi.nu
hyperparameter of the Studentt slab (used for a
"Student
" slab); defaults to 5.
m0
prior mean for the intercept parameter; defaults to 0.
M0
prior variance for the intercept parameter; defaults to 100.
aj0
a vector of prior means for the regression effects (which is encoded in a normal distribution, see note); defaults to vector of zeros.
V
variance of the slab; defaults to 5.
w
hyperparameters of the Betaprior for the mixture weight
ω; defaults to c(wa0=1, wb0=1)
, i.e. a uniform distribution.
pi
hyperparameters of the Betaprior for the mixture weight
π; defaults to c(pa0=1, pb0=1)
, i.e. a uniform distribution.
mcmc
:M
number of MCMC iterations after the burnin phase; defaults to 8000.
burnin
number of MCMC iterations discarded as burnin; defaults to 2000.
thin
thinning parameter; defaults to 1.
startsel
number of MCMC iterations drawn from the unrestricted
model (e.g., burnin/2
); defaults to 1000.
verbose
MCMC progress report in each verbose
th iteration
step; defaults to 500. If verbose=0
, no output is generated.
msave
returns additional output with variable selection details
(i.e. posterior samples for ω, δ, π,
γ); defaults to FALSE
.
The function returns an object of class "pogit
" with methods
print.pogit
, summary.pogit
and
plot.pogit
.
The returned object is a list containing the following elements:

a named list containing the samples from the posterior
distribution of the parameters in the Poisson model
(see also


a list containing the data 

a list containing details on the model specification,
see details for 

see details for 

see details for 

a list containing the total runtime ( 

see arguments 

a list containing starting values, see arguments 

"poisson" 

function call 
If prior information on the regression parameters is available, this
information is encoded in a normal distribution instead of the spike
and slab prior (consequently, BVS
is set to FALSE
).
This function can also be used to accommodate overdispersion in
count data by specifying an observationspecific random intercept
(see details for model
). The resulting model is an alternative
to the negative binomial model, see negbinBvs
.
Variance selection of the random intercept may be useful to examine
whether overdispersion is present in the data.
Michaela Dvorzak <m.dvorzak@gmx.at>, Helga Wagner
Dvorzak, M. and Wagner, H. (2016). Sparse Bayesian modelling of underreported count data. Statistical Modelling, 16(1), 24  46, doi: 10.1177/1471082x15588398.
FruehwirthSchnatter, S. (2007). Matlab package bayesf
2.0
on Finite Mixture and Markov Switching Models, Springer.
https://statmath.wu.ac.at/~fruehwirth/monographie/.
FruehwirthSchnatter, S., Fruehwirth, R., Held, L. and Rue, H. (2009). Improved auxiliary mixture sampling for hierarchical models of nonGaussian data. Statistics and Computing, 19, 479  492.
Wagner, H. and Duller, C. (2012). Bayesian model selection for logistic regression models with random intercept. Computational Statistics and Data Analysis, 56, 1256  1274.
pogitBvs
, negbinBvs
## Not run: ## Examples below should take about 12 minutes. ##  (use simul_pois1)  # load simulated data set 'simul_pois1' data(simul_pois1) y < simul_pois1$y X < as.matrix(simul_pois1[, 1]) # Bayesian variable selection for simulated data set m1 < poissonBvs(y = y, X = X) # print, summarize and plot results print(m1) summary(m1) plot(m1, maxPlots = 4) plot(m1, burnin = FALSE, thin = FALSE, maxPlots = 4) plot(m1, type = "acf") # MCMC sampling without BVS with specific MCMC and prior settings m2 < poissonBvs(y = y, X = X, prior = list(slab = "Normal"), mcmc = list(M = 6000, thin = 10), BVS = FALSE) print(m2) summary(m2, IAT = TRUE) plot(m2) # show traceplots disregarding thinning plot(m2, thin = FALSE) # specification of an overdispersed Poisson model with observationspecific # (normal) random intercept cID < seq_along(y) m3 < poissonBvs(y = y, X = X, model = list(ri = TRUE, clusterID = cID)) # print, summarize and plot results print(m3) summary(m3) # note that variance selection of the random intercept indicates that # overdispersion is not present in the data plot(m3, burnin = FALSE, thin = FALSE) ##  (use simul_pois2)  # load simulated data set 'simul_pois2' data(simul_pois2) y < simul_pois2$y X < as.matrix(simul_pois2[, c(1,2)]) cID < simul_pois2$cID # BVS for a Poisson model with clusterspecific random intercept m4 < poissonBvs(y = y, X = X, model = list(ri = TRUE, clusterID = cID), mcmc = list(M = 4000, burnin = 2000)) print(m4) summary(m4) plot(m4) # similar to m4, but without variance selection of the random intercept term model < list(gammafix = 1, ri = 1, clusterID = cID) m5 < poissonBvs(y = y, X = X, model = model, mcmc = list(M = 4000, thin = 5)) print(m5) summary(m5) plot(m5) # MCMC sampling without BVS for clustered observations m6 < poissonBvs(y = y, X = X, model = list(ri = 1, clusterID = cID), BVS = FALSE) print(m6) summary(m6) plot(m6, maxPlots = 4) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.