varbvsmix: Fit linear regression with mixture-of-normals priors using...

varbvsmixR Documentation

Fit linear regression with mixture-of-normals priors using variational approximation methods.

Description

Find the "best" fully-factorized approximation to the posterior distribution of the coefficients, with linear regression likelihood and mixture-of-normals priors on the coefficients. By "best", we mean the approximating distribution that locally minimizes the Kullback-Leibler divergence between the approximating distribution and the exact posterior. In the original formulation (see varbvs), each regression coefficient was drawn identically from a spike-and-slab prior. Here, we instead formulate the “slab” as a mixture of normals.

Usage

  varbvsmix(X, Z, y, sa, sigma, w, alpha, mu, update.sigma, update.sa,
            update.w, w.penalty, drop.threshold = 1e-8, tol = 1e-4,
            maxiter = 1e4, update.order = 1:ncol(X), verbose = TRUE)

Arguments

X

n x p input matrix, where n is the number of samples, and p is the number of variables. X cannot be sparse, and cannot have any missing values (NA).

Z

n x m covariate data matrix, where m is the number of covariates. Do not supply an intercept as a covariate (i.e., a column of ones), because an intercept is automatically included in the regression model. For no covariates, set Z = NULL.

y

Vector of length n containing values of the continuous outcome.

sa

Vector specifying the prior variance of the regression coefficients (scaled by sigma) for each mixture component. If not specified, a "reasonable" set of prior variances is automatically selected based on a simple regression analysis in which all variables (columns of X) are treated as independent. To override this default, set sa to vector of variances in which the first mixture component is the "spike", and therefore should be zero. Alternatively, sa may be an integer greater than 1, in which case this controls the number of mixture components, and the variances of the individual mixture components are automatically selected as described.

sigma

Residual variance parameter. If missing, it is automatically fitted to the data by computing an approximate maximum-likelihood estimate.

w

If missing, it is automatically fitted to the data by computing an approximate maximum-likelihood estimate.

alpha

Initial estimates of the approximate posterior mixture assignment probabilities. These should be specified as a p x K matrix, where K is the number of mixture components. Each row must add up to 1.

mu

Initial estimates of the approximate regression coefficients conditioned on being drawn from each of the K mixture components. These estimates should be provided as a p x K matrix, where K is the number of mixture components.

update.sigma

If TRUE, sigma is fitted to data using an approximate EM algorithm, in which case argument sigma, if provided, is the initial estimate.

update.sa

Currently, estimate of mixture component variances is not implemented, so this must be set to TRUE, otherwise an error will be generated.

update.w

If TRUE, mixture weights are fitted using an approximate EM algorithm, in which case argument w, if provided, is the initial estimate.

w.penalty

Penalty term for the mixture weights. It is useful for "regularizing" the estimate of w when we do not have a lot of information. It should be a vector with one positive entry for each mixture component. Larger values place more weight on the corresponding mixture components. It is based on the Dirichlet distribution with parameters w.penalty. The default is a vector of ones, which reduces to a uniform prior on w.

drop.threshold

Posterior probability threshold for dropping mixture components. Should be a positive number close to zero. If, at any point during the optimization, all posterior mixture assignment probabilities for a given mixture component k are less than drop.threshold, the mixture weight for component k is automatically set to zero. Set drop.threshold to zero to disable this behaviour. Setting larger values for drop.threshold may improve computation speed at a small cost to numerical accuracy of the final results.

tol

Convergence tolerance for co-ordinate ascent updates.

maxiter

Maximum number of co-ordinate ascent iterations.

update.order

Order of the co-ordinate ascent updates for fitting the variational approximation. The default is update.order = 1:p, where p is the number of variables (the number of columns of X).

verbose

If verbose = TRUE, print progress of algorithm to console.

Details

See https://www.overleaf.com/8954189vvpqnwpxhvhq.

Value

An object with S3 class c("varbvsmix","list").

n

Number of data samples used to fit model.

mu.cov

Posterior mean regression coefficients for covariates, including intercept.

update.sigma

If TRUE, residual variance parameter sigma was fit to data.

update.sa

If TRUE, mixture variances were fit to data.

update.w

If TRUE, mixture weights were fit to data.

w.penalty

Penalty used for updating mixture weights.

drop.threshold

Posterior probabiltiy threshold used in the optimization procedure for setting mixture weights to zero.

sigma

Fitted or user-specified residual variance parameter.

sa

User-specified mixture variances.

w

Fitted or user-specified mixture weights.

alpha

Variational estimates of posterior mixture assignent probabilities.

mu

Variational estimates of posterior mean coefficients.

s

Variational estimates of posterior variances.

lfsr

Local false sign rate (LFSR) for each variable computed from variational estimates of posterior assignment probabilities and posterior means and variances. See Stephens (2017) for a definition of the LFSR.

logZ

Variational lower bound to marginal log-likelihood at each iteration of the co-ordinate ascent algorithm.

err

Maximum difference in the variational posterior probabilities at each iteration of the co-ordinate ascent algorithm.

nzw

Number of nonzero mixture components (including the "spike") at each iteration of the co-ordinate ascent algorithm.

Author(s)

Peter Carbonetto peter.carbonetto@gmail.com

References

M. Stephens (2017). False discovery rates: a new deal. Biostatistics 18, 275–294.

See Also

varbvs

Examples


# Generate the data set.
set.seed(1)
n    <- 200
p    <- 500
X    <- randn(n,p)
sd   <- c(0,0.2,0.5)
w    <- c(0.9,0.05,0.05)
k    <- sample(length(w),p,replace = TRUE,prob = w)
beta <- sd[k] * rnorm(p)
y    <- c(X %*% beta + rnorm(n))

# Fit the model to the data, in which the variances of the mixture
# prior are automatically selected.
fit1 <- varbvsmix(X,NULL,y)

# Fit the model, but use only 3 mixture components in the prior
# instead of the default of 20.
fit2 <- varbvsmix(X,NULL,y,3)

# Use the "ground-truth" prior variances (the ones used to simulate
# the data).
fit3 <- varbvsmix(X,NULL,y,sd^2)

# Compare predicted outcomes against observed outcomes.
y.fit1 <- predict(fit1,X)
print(cor(y,y.fit1))

## Not run: 
library(lattice)
print(xyplot(beta.est ~ beta.true,
             data.frame(beta.true = beta,
                        beta.fitted = rowSums(fit$alpha * fit$mu)),
             pch = 20,col = "royalblue",cex = 1))

## End(Not run)

varbvs documentation built on June 7, 2023, 5:43 p.m.