varbvsmix | R Documentation |
Find the "best" fully-factorized approximation to the
posterior distribution of the coefficients, with linear regression
likelihood and mixture-of-normals priors on the coefficients. By
"best", we mean the approximating distribution that locally minimizes
the Kullback-Leibler divergence between the approximating distribution
and the exact posterior. In the original formulation (see
varbvs
), each regression coefficient was drawn
identically from a spike-and-slab prior. Here, we instead formulate
the “slab” as a mixture of normals.
varbvsmix(X, Z, y, sa, sigma, w, alpha, mu, update.sigma, update.sa,
update.w, w.penalty, drop.threshold = 1e-8, tol = 1e-4,
maxiter = 1e4, update.order = 1:ncol(X), verbose = TRUE)
X |
n x p input matrix, where n is the number of samples, and p is the number of variables. X cannot be sparse, and cannot have any missing values (NA). |
Z |
n x m covariate data matrix, where m is the number of
covariates. Do not supply an intercept as a covariate (i.e., a
column of ones), because an intercept is automatically included in
the regression model. For no covariates, set |
y |
Vector of length n containing values of the continuous outcome. |
sa |
Vector specifying the prior variance of the regression
coefficients (scaled by |
sigma |
Residual variance parameter. If missing, it is automatically fitted to the data by computing an approximate maximum-likelihood estimate. |
w |
If missing, it is automatically fitted to the data by computing an approximate maximum-likelihood estimate. |
alpha |
Initial estimates of the approximate posterior mixture assignment probabilities. These should be specified as a p x K matrix, where K is the number of mixture components. Each row must add up to 1. |
mu |
Initial estimates of the approximate regression coefficients conditioned on being drawn from each of the K mixture components. These estimates should be provided as a p x K matrix, where K is the number of mixture components. |
update.sigma |
If |
update.sa |
Currently, estimate of mixture component variances is
not implemented, so this must be set to |
update.w |
If |
w.penalty |
Penalty term for the mixture weights. It is useful
for "regularizing" the estimate of |
drop.threshold |
Posterior probability threshold for dropping
mixture components. Should be a positive number close to zero. If,
at any point during the optimization, all posterior mixture
assignment probabilities for a given mixture component |
tol |
Convergence tolerance for co-ordinate ascent updates. |
maxiter |
Maximum number of co-ordinate ascent iterations. |
update.order |
Order of the co-ordinate ascent updates for
fitting the variational approximation. The default is
|
verbose |
If |
See https://www.overleaf.com/8954189vvpqnwpxhvhq.
An object with S3 class c("varbvsmix","list")
.
n |
Number of data samples used to fit model. |
mu.cov |
Posterior mean regression coefficients for covariates, including intercept. |
update.sigma |
If |
update.sa |
If |
update.w |
If |
w.penalty |
Penalty used for updating mixture weights. |
drop.threshold |
Posterior probabiltiy threshold used in the optimization procedure for setting mixture weights to zero. |
sigma |
Fitted or user-specified residual variance parameter. |
sa |
User-specified mixture variances. |
w |
Fitted or user-specified mixture weights. |
alpha |
Variational estimates of posterior mixture assignent probabilities. |
mu |
Variational estimates of posterior mean coefficients. |
s |
Variational estimates of posterior variances. |
lfsr |
Local false sign rate (LFSR) for each variable computed from variational estimates of posterior assignment probabilities and posterior means and variances. See Stephens (2017) for a definition of the LFSR. |
logZ |
Variational lower bound to marginal log-likelihood at each iteration of the co-ordinate ascent algorithm. |
err |
Maximum difference in the variational posterior probabilities at each iteration of the co-ordinate ascent algorithm. |
nzw |
Number of nonzero mixture components (including the "spike") at each iteration of the co-ordinate ascent algorithm. |
Peter Carbonetto peter.carbonetto@gmail.com
M. Stephens (2017). False discovery rates: a new deal. Biostatistics 18, 275–294.
varbvs
# Generate the data set.
set.seed(1)
n <- 200
p <- 500
X <- randn(n,p)
sd <- c(0,0.2,0.5)
w <- c(0.9,0.05,0.05)
k <- sample(length(w),p,replace = TRUE,prob = w)
beta <- sd[k] * rnorm(p)
y <- c(X %*% beta + rnorm(n))
# Fit the model to the data, in which the variances of the mixture
# prior are automatically selected.
fit1 <- varbvsmix(X,NULL,y)
# Fit the model, but use only 3 mixture components in the prior
# instead of the default of 20.
fit2 <- varbvsmix(X,NULL,y,3)
# Use the "ground-truth" prior variances (the ones used to simulate
# the data).
fit3 <- varbvsmix(X,NULL,y,sd^2)
# Compare predicted outcomes against observed outcomes.
y.fit1 <- predict(fit1,X)
print(cor(y,y.fit1))
## Not run:
library(lattice)
print(xyplot(beta.est ~ beta.true,
data.frame(beta.true = beta,
beta.fitted = rowSums(fit$alpha * fit$mu)),
pch = 20,col = "royalblue",cex = 1))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.