geomc.vs: Markov chain Monte Carlo for Bayesian variable selection...
In geommc: Geometric Markov Chain Sampling

View source: R/geomcvs.R

geomc.vs

R Documentation

Markov chain Monte Carlo for Bayesian variable selection using a geometric MH algorithm.

Description

geomc.vs uses a geometric approach to MCMC for performing Bayesian variable selection. It produces MCMC samples from the posterior density of a Bayesian hierarchical feature selection model.

Usage

geomc.vs(
  X,
  y,
  initial = NULL,
  n.iter = 50,
  burnin = 1,
  eps = 0.5,
  symm = TRUE,
  move.prob = c(0.4, 0.4, 0.2),
  lam0 = 0,
  a0 = 0,
  b0 = 0,
  lam = nrow(X)/ncol(X)^2,
  w = sqrt(nrow(X))/ncol(X),
  model.summary = FALSE,
  model.threshold = 0.5
)

Arguments

`X`	The `n\times p` covariate matrix without intercept. The following classes are supported: `matrix` and `dgCMatrix`. No need to center or scale this matrix manually. Scaling is performed implicitly and regression coefficients are returned on the original scale.
`y`	The response vector of length `n`. No need to center or scale.
`initial`	is the initial model (the set of active variables). Default: Null model.
`n.iter`	is the no. of samples needed. Default: 50.
`burnin`	is the value of burnin used to compute the median probability model. Default: 1.
`eps`	is the value for epsilon perturbation. Default: 0.5.
`symm`	indicates if the base density is of symmetric RW-MH. Default: True.
`move.prob`	is the vector of ('addition', 'deletion', 'swap') move probabilities. Default: (0.4,0.4,0.2). move.prob is used only when symm is set to False.
`lam0`	The precision parameter for `\beta_0`. Default: 0 (corresponding to improper uniform prior).
`a0`	The shape parameter for prior on `\sigma^2`. Default: 0.
`b0`	The scale parameter for prior on `\sigma^2`. Default: 0.
`lam`	The slab precision parameter. Default: `n/p^2` as suggested by the theoretical results of Li, Dutta, Roy (2023).
`w`	The prior inclusion probability of each variable. Default: `\sqrt{n}/p`.
`model.summary`	If true, additional summaries are returned. Default: FALSE.
`model.threshold`	The threshold probability to select the covariates for the median model (median.model) and the weighted average model (wam). A covariate will be included in median.model (wam) if its marginal inclusion probability (weighted marginal inclusion probability) is greater than the threshold. Default: 0.5.

Details

geomc.vs provides MCMC samples using the geometric MH algorithm of Roy (2024) for variable selection based on a hierarchical Gaussian linear model with priors placed on the regression coefficients as well as on the model space as follows:

y | X, \beta_0,\beta,\gamma,\sigma^2,w,\lambda \sim N(\beta_01 + X_\gamma\beta_\gamma,\sigma^2I_n)

\beta_i|\beta_0,\gamma,\sigma^2,w,\lambda \stackrel{indep.}{\sim} N(0, \gamma_i\sigma^2/\lambda),~i=1,\ldots,p,

\beta_0|\gamma,\sigma^2,w,\lambda \sim N(0, \sigma^2/\lambda_0)

\sigma^2|\gamma,w,\lambda \sim Inv-Gamma (a_0, b_0)

\gamma_i|w,\lambda \stackrel{iid}{\sim} Bernoulli(w)

where X_\gamma is the n \times |\gamma| submatrix of X consisting of those columns of X for which \gamma_i=1 and similarly, \beta_\gamma is the |\gamma| subvector of \beta corresponding to \gamma. The density \pi(\sigma^2) of \sigma^2 \sim Inv-Gamma (a_0, b_0) has the form \pi(\sigma^2) \propto (\sigma^2)^{-a_0-1} \exp(-b_0/\sigma^2). The functions in the package also allow the non-informative prior (\beta_{0}, \sigma^2)|\gamma, w \sim 1 / \sigma^{2} which is obtained by setting \lambda_0=a_0=b_0=0. geomc.vs provides the empirical MH acceptance rate and MCMC samples from the posterior pmf of the models P(\gamma|y), which is available up to a normalizing constant. If \code{model.summary} is set TRUE, geomc.vs also returns other model summaries. In particular, it returns the marginal inclusion probabilities (mip) computed by the Monte Carlo average as well as the weighted marginal inclusion probabilities (wmip) computed with weights

w_i = P(\gamma^{(i)}|y)/\sum_{k=1}^K P(\gamma^{(k)}|y), i=1,2,...,K

where \gamma^{(k)}, k=1,2,...,K are the distinct models sampled. Thus, if N_k is the no. of times the kth distinct model \gamma^{(k)} is repeated in the MCMC samples, the mip for the jth variable is

mip_j = \sum_{k=1}^{K} N_k I(\gamma^{(k)}_j = 1)/n.iter

and wmip for the jth variable is

wmip_j = \sum_{k=1}^K w_k I(\gamma^{(k)}_j = 1).

The median.model is the model containing variables j with mip_j > \code{model.threshold} and the wam is the model containing variables j with wmip_j > \code{model.threshold}. Note that E(\beta|\gamma, y), the conditional posterior mean of \beta given a model \gamma is available in closed form (see Li, Dutta, Roy (2023) for details). geomc.vs returns two estimates (beta.mean, beta.wam) of the posterior mean of \beta computed as

beta.mean = \sum_{k=1}^{K} N_k E(\beta|\gamma^{(k)},y)/n.iter

and

beta.wam = \sum_{k=1}^K w_k E(\beta|\gamma^{(k)},y),

respectively.

Value

A list with components

`samples`	MCMC samples from `P(\gamma\|y)` returned as a `p \times`n.iter sparse `lgCMatrix`.
`acceptance.rate`	The acceptance rate based on all samples.
`mip`	The `p` vector of marginal inclusion probabilities of all variables based on post burnin samples.
`median.model`	The median probability model based on post burnin samples.
`beta.mean`	The Monte Carlo estimate of posterior mean of `\beta` (the `p+1` vector c(intercept, regression coefficients)) based on post burnin samples.
`wmip`	The `p` vector of weighted marginal inclusion probabilities of all variables based on post burnin samples.
`wam`	The weighted average model based on post burnin samples.
`beta.wam`	The model probability weighted estimate of posterior mean of `\beta` (the `p+1` vector c(intercept, regression coefficients)) based on post burnin samples.
`log.post`	The n.iter vector of log of the unnormalized marginal posterior pmf `P(\gamma\|y)` evaluated at the samples.

Author(s)

Vivekananda Roy

References

Roy, V.(2024) A geometric approach to informative MCMC sampling https://arxiv.org/abs/2406.09010

Li, D., Dutta, S., Roy, V.(2023) Model Based Screening Embedded Bayesian Variable Selection for Ultra-high Dimensional Settings, Journal of Computational and Graphical Statistics, 32, 61-73

Examples

n=50; p=100; nonzero = 3
trueidx <- 1:3
nonzero.value <- 4
TrueBeta <- numeric(p)
TrueBeta[trueidx] <- nonzero.value
rho <- 0.5
xone <- matrix(rnorm(n*p), n, p)
X <- sqrt(1-rho)*xone + sqrt(rho)*rnorm(n)
y <- 0.5 + X %*% TrueBeta + rnorm(n)
result <- geomc.vs(X=X, y=y,model.summary = TRUE)
result$samples # the MCMC samples
result$acceptance.rate #the acceptance.rate
result$mip #marginal inclusion probabilities
result$wmip #weighted marginal inclusion probabilities
result$median.model #the median.model
result$wam #the weighted average model
result$beta.mean #the posterior mean of regression coefficients
result$beta.wam #another estimate of the posterior mean of regression coefficients
result$log.post #the log (unnormalized) posterior probabilities of the MCMC samples.

geommc documentation built on Oct. 19, 2024, 1:08 a.m.