Description Usage Arguments Details Value Author(s) References See Also Examples
EMVS is a fast deterministic approach to identifying sparse high posterior models for Bayesian variable selection under spike-and-slab priors in linear regression.
EMVS performs dynamic posterior exploration, which outputs a solution path computed at a grid
of values for the spike variance parameter v0
.
1 2 3 |
Y |
Vector of continuous responses (n x 1). The responses are expected to be centered. |
X |
Matrix of regressors (n x p). Continous predictors are expected to be standardized to have mean zero and standard deviation one. |
v0 |
Spike variance parameters. Either a numeric value for a single run or a sequence of increasing values for dynamic posterior exploration. |
v1 |
Slab variance parameter. Needs to be greater than v0. |
type |
Type of the prior distribution over the model space: |
independent |
If TRUE, the regression coefficients and the error variance are taken to be independent a priori (default). If FALSE, a conjugate prior is used as in Rockova and George (2014). |
beta_init |
Vector (p x 1) of initial values for the regression parameters beta^0=[X'X+0.5(1/v1+1/v0)I_p]^{-1}X'Y. |
sigma_init |
Initial value for the residual variance parameter. |
epsilon |
Convergence margin parameter. The computation at each ||beta^{k+1}-beta^{k}||_2<epsilon. |
temperature |
Temperature parameter for deterministic annealing. If missing, a default value |
theta |
Prior inclusion probability for |
a,b |
Scale parameters of the beta distribution for |
v1_g |
Slab variance parameter value for the g-function. If missing, a default value |
direction |
Direction of the sequential reinitialization in dynamic posterior exploration. The default is |
standardize |
If |
log_v0 |
If TRUE, the v0s are displayed on the log scale in |
An EM algorithm is applied to find posterior modes of the regression parameters in linear models under spike and slab priors.
Variable selection is performed by threshodling the posterior modes to obtain models gamma
with high posterior probability P(gamma|Y)
. The spike variance v0
can be altered to obtain models with various degrees of sparsity. The slab variance is set to a fixed value v1>v0
. The thresholding is based on the conditional posterior probabilities of inclusion, which are outputed of the procedure. Variables are included as long as their inclusion probability is above 0.5
.
Dynamic exploration is achieved by considering a sequence of increasing spike variance parameters v0
. For each v0, a candidate model is obtained.
For the conjugate prior case, the best model is then picked according to a criterion ("log g-function"), which equals to the log of the posterior model probability up to a constant
log g(gamma)=log P(gamma| Y) + C.
Independent and sequential initializations are implemented. Sequential initialization uses previously found modes are warm starts in both forward and backward direction of the given sequence of v0 values.
A list object, for which EMVSplot
and EMVSbest
functions exist.
betas |
Matrix of estimated regression coefficients (posterior modal estimates) of dimension (L x p), where L is the length of v0. |
log_g_function |
Vector (L x 1) of log posterior model probabilities (up to a constant) of subsets found for each v0. (Only available for |
intersects |
Vector (L x 1) of posterior weighted intersection points between spike and slab components. |
sigmas |
Vector (L x 1) of estimated residual variances. |
v1 |
Slab variance parameter values used. |
v0 |
Spike variance parameter values used. |
niters |
Vector (L x 1) of numbers of interations until convergence for each v0 |
prob_inclusion |
A matrix (L x p) of conditional inclusion probabilities. Each row corresponds to a single v0 value. |
type |
Type of the model prior used. |
type |
Type of initialization used, |
theta |
Vector (L x 1) of estimated inclusion probabilities for |
Veronika Rockova
Maintainer: Veronika Rockova veronika.rockova@chicagobooth.edu
Rockova, V. and George, E. I. (2014)
EMVS: The EM Approach to Bayesian Variable Selection,
http://amstat.tandfonline.com/doi/abs/10.1080/01621459.2013.869223?journalCode=uasa20#preview
Journal of the American Statistical Association
EMVSplot
, EMVSsummary
, EMVSbest
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | # Linear regression with p>n variables
n = 100
p = 1000
X = matrix(rnorm(n * p), n, p)
beta = c(1.5, 2, 2.5, rep(0, p-3))
Y = X[,1] * beta[1] + X[,2] * beta[2] + X[,3] * beta[3] + rnorm(n)
# conjugate prior on regression coefficients and variance
v0 = seq(0.1, 2, length.out = 20)
v1 = 1000
beta_init = rep(1, p)
sigma_init = 1
a = b = 1
epsilon = 10^{-5}
result = EMVS(Y, X, v0 = v0, v1 = v1, type = "betabinomial",
independent = FALSE, beta_init = beta_init, sigma_init = sigma_init,
epsilon = epsilon, a = a, b = b)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.