geomc.vs | R Documentation |
geomc.vs uses a geometric approach to MCMC for performing Bayesian variable selection. It produces MCMC samples from the posterior density of a Bayesian hierarchical feature selection model.
geomc.vs(
X,
y,
initial = NULL,
n.iter = 50,
burnin = 1,
eps = 0.5,
symm = TRUE,
move.prob = c(0.4, 0.4, 0.2),
lam0 = 0,
a0 = 0,
b0 = 0,
lam = nrow(X)/ncol(X)^2,
w = sqrt(nrow(X))/ncol(X),
model.summary = FALSE,
model.threshold = 0.5
)
X |
The |
y |
The response vector of length |
initial |
is the initial model (the set of active variables). Default: Null model. |
n.iter |
is the no. of samples needed. Default: 50. |
burnin |
is the value of burnin used to compute the median probability model. Default: 1. |
eps |
is the value for epsilon perturbation. Default: 0.5. |
symm |
indicates if the base density is of symmetric RW-MH. Default: True. |
move.prob |
is the vector of ('addition', 'deletion', 'swap') move probabilities. Default: (0.4,0.4,0.2). move.prob is used only when symm is set to False. |
lam0 |
The precision parameter for |
a0 |
The shape parameter for prior on |
b0 |
The scale parameter for prior on |
lam |
The slab precision parameter. Default: |
w |
The prior inclusion probability of each variable. Default: |
model.summary |
If true, additional summaries are returned. Default: FALSE. |
model.threshold |
The threshold probability to select the covariates for the median model (median.model) and the weighted average model (wam). A covariate will be included in median.model (wam) if its marginal inclusion probability (weighted marginal inclusion probability) is greater than the threshold. Default: 0.5. |
geomc.vs provides MCMC samples using the geometric MH algorithm of Roy (2024) for variable selection based on a hierarchical Gaussian linear model with priors placed on the regression coefficients as well as on the model space as follows:
y | X, \beta_0,\beta,\gamma,\sigma^2,w,\lambda \sim N(\beta_01 + X_\gamma\beta_\gamma,\sigma^2I_n)
\beta_i|\beta_0,\gamma,\sigma^2,w,\lambda \stackrel{indep.}{\sim} N(0, \gamma_i\sigma^2/\lambda),~i=1,\ldots,p,
\beta_0|\gamma,\sigma^2,w,\lambda \sim N(0, \sigma^2/\lambda_0)
\sigma^2|\gamma,w,\lambda \sim Inv-Gamma (a_0, b_0)
\gamma_i|w,\lambda \stackrel{iid}{\sim} Bernoulli(w)
where X_\gamma
is the n \times |\gamma|
submatrix of X
consisting of
those columns of X
for which \gamma_i=1
and similarly, \beta_\gamma
is the
|\gamma|
subvector of \beta
corresponding to \gamma
. The density \pi(\sigma^2)
of
\sigma^2 \sim Inv-Gamma (a_0, b_0)
has the form \pi(\sigma^2) \propto (\sigma^2)^{-a_0-1} \exp(-b_0/\sigma^2)
.
The functions in the package also allow the non-informative prior (\beta_{0}, \sigma^2)|\gamma, w \sim 1 / \sigma^{2}
which is obtained by setting \lambda_0=a_0=b_0=0
.
geomc.vs provides the empirical MH acceptance rate and MCMC samples from the posterior pmf of the models P(\gamma|y)
, which is available
up to a normalizing constant.
If \code{model.summary}
is set TRUE, geomc.vs also returns other model summaries. In particular, it returns the
marginal inclusion probabilities (mip) computed by the Monte Carlo average as well as the weighted marginal
inclusion probabilities (wmip) computed with weights
w_i =
P(\gamma^{(i)}|y)/\sum_{k=1}^K P(\gamma^{(k)}|y), i=1,2,...,K
where \gamma^{(k)}, k=1,2,...,K
are the distinct
models sampled. Thus, if N_k
is the no. of times the k
th distinct model \gamma^{(k)}
is repeated in the MCMC samples,
the mip for the j
th variable is
mip_j =
\sum_{k=1}^{K} N_k I(\gamma^{(k)}_j = 1)/n.iter
and
wmip for the j
th variable is
wmip_j =
\sum_{k=1}^K w_k I(\gamma^{(k)}_j = 1).
The median.model is the model containing variables j
with mip_j >
\code{model.threshold}
and the wam is the model containing variables j
with wmip_j >
\code{model.threshold}
. Note that E(\beta|\gamma, y)
, the conditional posterior mean of \beta
given a model \gamma
is
available in closed form (see Li, Dutta, Roy (2023) for details). geomc.vs returns two estimates (beta.mean, beta.wam) of the posterior mean
of \beta
computed as
beta.mean = \sum_{k=1}^{K} N_k E(\beta|\gamma^{(k)},y)/n.iter
and
beta.wam = \sum_{k=1}^K w_k E(\beta|\gamma^{(k)},y),
respectively.
A list with components
samples |
MCMC samples from |
acceptance.rate |
The acceptance rate based on all samples. |
mip |
The |
median.model |
The median probability model based on post burnin samples. |
beta.mean |
The Monte Carlo estimate of posterior mean of |
wmip |
The |
wam |
The weighted average model based on post burnin samples. |
beta.wam |
The model probability weighted estimate of posterior mean of |
log.post |
The n.iter vector of log of the unnormalized marginal posterior pmf |
Vivekananda Roy
Roy, V.(2024) A geometric approach to informative MCMC sampling https://arxiv.org/abs/2406.09010
Li, D., Dutta, S., Roy, V.(2023) Model Based Screening Embedded Bayesian Variable Selection for Ultra-high Dimensional Settings, Journal of Computational and Graphical Statistics, 32, 61-73
n=50; p=100; nonzero = 3
trueidx <- 1:3
nonzero.value <- 4
TrueBeta <- numeric(p)
TrueBeta[trueidx] <- nonzero.value
rho <- 0.5
xone <- matrix(rnorm(n*p), n, p)
X <- sqrt(1-rho)*xone + sqrt(rho)*rnorm(n)
y <- 0.5 + X %*% TrueBeta + rnorm(n)
result <- geomc.vs(X=X, y=y,model.summary = TRUE)
result$samples # the MCMC samples
result$acceptance.rate #the acceptance.rate
result$mip #marginal inclusion probabilities
result$wmip #weighted marginal inclusion probabilities
result$median.model #the median.model
result$wam #the weighted average model
result$beta.mean #the posterior mean of regression coefficients
result$beta.wam #another estimate of the posterior mean of regression coefficients
result$log.post #the log (unnormalized) posterior probabilities of the MCMC samples.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.