logitBvs: Bayesian variable selection for the binomial logit model

View source: R/logitBvs.R

logitBvsR Documentation

Bayesian variable selection for the binomial logit model

Description

This function performs Bayesian variable selection for binomial logit regression models via spike and slab priors. A cluster-specific random intercept can be included in the model to account for within-cluster dependence with variance selection of the random intercept to determine whether there is between-cluster variation in the model. For posterior inference, a MCMC sampling algorithm is used which is based on data augmentation.

Usage

logitBvs(
  y,
  N,
  X,
  model = list(),
  prior = list(),
  mcmc = list(),
  start = NULL,
  BVS = TRUE
)

Arguments

y

an integer vector of binomial counts

N

an integer vector containing the number of trials

X

a design matrix (including an intercept term)

model

an (optional) list specifying the structure of the model (see details)

prior

an (optional) list of prior settings and hyper-parameters controlling the priors (see details)

mcmc

an (optional) list of MCMC sampling options (see details)

start

an (optional), numeric vector containing starting values for the regression effects (including an intercept term); defaults to NULL (i.e. a vector of zeros is used).

BVS

if TRUE (default), Bayesian variable selection is performed to identify regressors with a non-zero effect; otherwise, an unrestricted model is estimated (without variable selection).

Details

The method provides Bayesian variable selection for binomial logit models using mixture priors with a spike and a slab component to identify regressors with a non-zero effect. More specifically, a Dirac spike is used, i.e. a point mass at zero and (by default), the slab component is specified as a scale mixture of normal distributions, resulting in a Student-t distribution with 2psi.nu degrees of freedom. In the more general random intercept model, variance selection of the random intercept is based on the non-centered parameterization of the model, where the signed standard deviation θ_α of the random intercept term appears as a further regression effect in the model equation. For details, see Wagner and Duller (2012).

The implementation of Bayesian variable selection further relies on the representation of the binomial logit model as a Gaussian regression model in auxiliary variables. Data augmentation is based on Fussl et al. (2013), who show that the binomial logit model can be represented as a linear regression model in the latent variable, which has an interpretation as the difference of aggregated utilities. The error distribution in the auxiliary model is approximated by a finite scale mixture of normal distributions, where the mixture parameters are taken from the R package binomlogit. See Fussl (2014) for details.

For details concerning the sampling algorithm see Dvorzak and Wagner (2016) and Wagner and Duller (2012).

Details for model specification (see arguments):

model:
deltafix

an indicator vector of length ncol(X)-1 specifying which regression effects are subject to selection (i.e., 0 = subject to selection, 1 = fix in the model); defaults to a vector of zeros.

gammafix

an indicator for variance selection of the random intercept term (i.e., 0 = with variance selection (default), 1 = no variance selection); only used if a random intercept is includued in the model (see ri).

ri

logical. If TRUE, a cluster-specific random intercept is included in the model; defaults to FALSE.

clusterID

a numeric vector of length equal to the number of observations containing the cluster ID c = 1,...,C for each observation (required if ri=TRUE).

prior:
slab

distribution of the slab component, i.e. "Student" (default) or "Normal".

psi.nu

hyper-parameter of the Student-t slab (used for a "Student" slab); defaults to 5.

m0

prior mean for the intercept parameter; defaults to 0.

M0

prior variance for the intercept parameter; defaults to 100.

aj0

a vector of prior means for the regression effects (which is encoded in a normal distribution, see notes); defaults to vector of zeros.

V

variance of the slab; defaults to 5.

w

hyper-parameters of the Beta-prior for the mixture weight ω; defaults to c(wa0=1, wb0=1), i.e. a uniform distribution.

pi

hyper-parameters of the Beta-prior for the mixture weight π; defaults to c(pa0=1, pb0=1), i.e. a uniform distribution.

mcmc:
M

number of MCMC iterations after the burn-in phase; defaults to 8000.

burnin

number of MCMC iterations discarded as burn-in; defaults to 2000.

thin

thinning parameter; defaults to 1.

startsel

number of MCMC iterations drawn from the unrestricted model (e.g., burnin/2); defaults to 1000.

verbose

MCMC progress report in each verbose-th iteration step; defaults to 500. If verbose=0, no output is generated.

msave

returns additional output with variable selection details (i.e. posterior samples for ω, δ, π, γ); defaults to FALSE.

Value

The function returns an object of class "pogit" with methods print.pogit, summary.pogit and plot.pogit.

The returned object is a list containing the following elements:

samplesL

a named list containing the samples from the posterior distribution of the parameters in the binomial logit model (see also msave):

alpha, thetaAlpha

regression coefficients α and θ_α

pdeltaAlpha

P(δ_α=1)

psiAlpha

scale parameter ψ_α of the slab component

pgammaAlpha

P(γ_α=1)

ai

cluster-specific random intercept

data

a list containing the data y, N and X

model.logit

a list containing details on the model specification, see details for model

mcmc

see details for mcmc

prior.logit

see details for prior

dur

a list containing the total runtime (total) and the runtime after burn-in (durM), in seconds

BVS

see arguments

start

a list containing starting values, see arguments

family

"logit"

call

function call

Note

If prior information on the regression parameters is available, this information is encoded in a normal distribution instead of the spike and slab prior (BVS is set to FALSE).

For binary observations, a vector of ones for the number of trials N is required.

Author(s)

Michaela Dvorzak <m.dvorzak@gmx.at>, Helga Wagner

References

Dvorzak, M. and Wagner, H. (2016). Sparse Bayesian modelling of underreported count data. Statistical Modelling, 16(1), 24 - 46, doi: 10.1177/1471082x15588398.

Fussl, A., Fruehwirth-Schnatter, S. and Fruehwirth, R. (2013). Efficient MCMC for Binomial Logit Models. ACM Transactions on Modeling and Computer Simulation, 23, 1, Article 3, 1-21.

Fussl, A. (2014). binomlogit: Efficient MCMC for Binomial Logit Models. R package version 1.2, https://CRAN.R-project.org/package=binomlogit.

Wagner, H. and Duller, C. (2012). Bayesian model selection for logistic regression models with random intercept. Computational Statistics and Data Analysis, 56, 1256-1274.

See Also

pogitBvs

Examples

## Not run: 
## Examples below should take about 1-2 minutes.

# load simulated data set 'simul_binomial'
data(simul_binomial)
y <- simul_binomial$y
N <- simul_binomial$N
X <- as.matrix(simul_binomial[, -c(1, 2)])

# Bayesian variable selection for simulated data set
m1 <- logitBvs(y = y, N = N, X = X)

# print, summarize and plot results
print(m1)
summary(m1)
plot(m1)

# MCMC sampling without BVS with specific MCMC and prior settings
m2 <- logitBvs(y = y, N = N, X = X, prior = list(slab = "Normal"), 
               mcmc = list(M = 4000, burnin = 1000, thin = 5),
               BVS = FALSE)
print(m2)    
summary(m2)
plot(m2, maxPlots = 4) 

# BVS with specification of regression effects subject to selection
m3 <- logitBvs(y = y, N = N, X = X, mcmc = list(M = 4000, burnin = 1000), 
               model = list(deltafix = c(1, 1, 1, 0, 0, 0, 1, 0, 0)))   
print(m3)   
summary(m3)
plot(m3, burnin = FALSE, maxPlots = 4)
plot(m3, type = "acf", maxPlots = 4)       

## End(Not run)

pogit documentation built on May 25, 2022, 5:05 p.m.