Description Usage Arguments Details Value Note Author(s) References See Also Examples
This function performs Bayesian variable selection for binomial logit regression models via spike and slab priors. A clusterspecific random intercept can be included in the model to account for withincluster dependence with variance selection of the random intercept to determine whether there is betweencluster variation in the model. For posterior inference, a MCMC sampling algorithm is used which is based on data augmentation.
1 2 
y 
an integer vector of binomial counts 
N 
an integer vector containing the number of trials 
X 
a design matrix (including an intercept term) 
model 
an (optional) list specifying the structure of the model (see details) 
prior 
an (optional) list of prior settings and hyperparameters controlling the priors (see details) 
mcmc 
an (optional) list of MCMC sampling options (see details) 
start 
an (optional), numeric vector containing starting values for the
regression effects (including an intercept term); defaults to 
BVS 
if 
The method provides Bayesian variable selection for binomial logit models
using mixture priors with a spike and a slab component to identify regressors
with a nonzero effect. More specifically, a Dirac spike is used, i.e. a
point mass at zero and (by default), the slab component is specified as a scale
mixture of normal distributions, resulting in a Studentt distribution with
2psi.nu
degrees of freedom.
In the more general random intercept model, variance selection of the random
intercept is based on the noncentered parameterization of the model, where
the signed standard deviation θ_α of the random intercept term
appears as a further regression effect in the model equation.
For details, see Wagner and Duller (2012).
The implementation of Bayesian variable selection further relies on the
representation of the binomial logit model as a Gaussian regression model
in auxiliary variables. Data augmentation is based on Fussl et
al. (2013), who show that the binomial logit model can be represented as a
linear regression model in the latent variable, which has an interpretation as
the difference of aggregated utilities. The error distribution in the auxiliary
model is approximated by a finite scale mixture of normal distributions, where
the mixture parameters are taken from the R package binomlogit
.
See Fussl (2014) for details.
For details concerning the sampling algorithm see Dvorzak and Wagner (2016) and Wagner and Duller (2012).
Details for the model specification (see arguments):
model
A list:
deltafix
an indicator vector of length ncol(X)1
specifying which regression effects are subject to selection (i.e., 0 =
subject to selection, 1 = fix in the model); defaults to a vector of zeros.
gammafix
an indicator for variance selection of the random
intercept term (i.e., 0 = with variance selection (default), 1 = no
variance selection); only used if a random intercept is includued in the
model (see ri
).
ri
logical. If TRUE
, a clusterspecific
random intercept is included in the model; defaults to FALSE
.
clusterID
a numeric vector of length equal to the number
of observations containing the cluster ID c = 1,...,C for each observation
(required if ri=TRUE
).
prior
A list:
slab
distribution of the slab component, i.e. "Student
"
(default) or "Normal
".
psi.nu
hyperparameter of the Studentt slab (used for a
"Student
" slab); defaults to 5.
m0
prior mean for the intercept parameter; defaults to 0.
M0
prior variance for the intercept parameter; defaults to 100.
aj0
a vector of prior means for the regression effects (which is encoded in a normal distribution, see notes); defaults to vector of zeros.
V
variance of the slab; defaults to 5.
w
hyperparameters of the Betaprior for the mixture weight
ω; defaults to c(wa0=1, wb0=1)
, i.e. a uniform
distribution.
pi
hyperparameters of the Betaprior for the mixture weight
π; defaults to c(pa0=1, pb0=1)
, i.e. a uniform
distribution.
mcmc
A list:
M
number of MCMC iterations after the burnin phase; defaults to 8000.
burnin
number of MCMC iterations discarded as burnin; defaults to 2000.
thin
thinning parameter; defaults to 1.
startsel
number of MCMC iterations drawn from the unrestricted
model (e.g., burnin/2
); defaults to 1000.
verbose
MCMC progress report in each verbose
th
iteration step; defaults to 500. If verbose=0
, no output is
generated.
msave
returns additional output with variable
selection details (i.e. posterior samples for ω,
δ, π, γ); defaults to FALSE
.
The function returns an object of class "pogit
" with methods
print.pogit
, summary.pogit
and
plot.pogit
.
The returned object is a list containing the following elements:

a named list containing the samples from the posterior
distribution of the parameters in the binomial logit model
(see also


a list containing the data 

a list containing details on the model specification,
see details for 

see details for 

see details for 

a list containing the total runtime ( 

see arguments 

a list containing starting values, see arguments 

"logit" 

function call 
If prior information on the regression parameters is available, this
information is encoded in a normal distribution instead of the
spike and slab prior (BVS
is set to FALSE
).
For binary observations, a vector of ones for the number of trials N
is required.
Michaela Dvorzak <[email protected]>, Helga Wagner
Dvorzak, M. and Wagner, H. (2016). Sparse Bayesian modelling of underreported count data. Statistical Modelling, 16(1), 24  46, http://dx.doi.org/10.1177/1471082x15588398.
Fussl, A., FruehwirthSchnatter, S. and Fruehwirth, R. (2013). Efficient MCMC for Binomial Logit Models. ACM Transactions on Modeling and Computer Simulation, 23, 1, Article 3, 121.
Fussl, A. (2014). binomlogit
: Efficient MCMC for Binomial
Logit Models. R package version 1.2,
https://CRAN.Rproject.org/package=binomlogit.
Wagner, H. and Duller, C. (2012). Bayesian model selection for logistic regression models with random intercept. Computational Statistics and Data Analysis, 56, 12561274.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34  ## Not run:
## Examples below should take about 12 minutes.
# load simulated data set 'simul_binomial'
data(simul_binomial)
y < simul_binomial$y
N < simul_binomial$N
X < as.matrix(simul_binomial[, c(1, 2)])
# Bayesian variable selection for simulated data set
m1 < logitBvs(y = y, N = N, X = X)
# print, summarize and plot results
print(m1)
summary(m1)
plot(m1)
# MCMC sampling without BVS with specific MCMC and prior settings
m2 < logitBvs(y = y, N = N, X = X, prior = list(slab = "Normal"),
mcmc = list(M = 4000, burnin = 1000, thin = 5),
BVS = FALSE)
print(m2)
summary(m2)
plot(m2, maxPlots = 4)
# BVS with specification of regression effects subject to selection
m3 < logitBvs(y = y, N = N, X = X, mcmc = list(M = 4000, burnin = 1000),
model = list(deltafix = c(1, 1, 1, 0, 0, 0, 1, 0, 0)))
print(m3)
summary(m3)
plot(m3, burnin = FALSE, maxPlots = 4)
plot(m3, type = "acf", maxPlots = 4)
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.