svb.fit: Fit Approximate Posteriors to Sparse Linear and Logistic...

Description Usage Arguments Details Value Examples

View source: R/svb.fit.R

Description

Main function of the sparsevb package. Computes mean-field posterior approximations for both linear and logistic regression models, including variable selection via sparsity-inducing spike and slab priors.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
svb.fit(
  X,
  Y,
  family = c("linear", "logistic"),
  slab = c("laplace", "gaussian"),
  mu,
  sigma = rep(1, ncol(X)),
  gamma,
  alpha,
  beta,
  prior_scale = 1,
  update_order,
  intercept = FALSE,
  noise_sd,
  max_iter = 1000,
  tol = 1e-05
)

Arguments

X

A numeric design matrix, each row of which represents a vector of covariates/independent variables/features. Though not required, it is recommended to center and scale the columns to have norm sqrt(nrow(X)).

Y

An nrow(X)-dimensional response vector, numeric if family = "linear" and binary if family = "logistic".

family

A character string selecting the regression model, either "linear" or "logistic".

slab

A character string specifying the prior slab density, either "laplace" or "gaussian".

mu

An ncol(X)-dimensional numeric vector, serving as initial guess for the variational means. If omitted, mu will be estimated via ridge regression to initialize the coordinate ascent algorithm.

sigma

A positive ncol(X)-dimensional numeric vector, serving as initial guess for the variational standard deviations.

gamma

An ncol(X)-dimensional vector of probabilities, serving as initial guess for the variational inclusion probabilities. If omitted, gamma will be estimated via LASSO regression to initialize the coordinate ascent algorithm.

alpha

A positive numeric value, parametrizing the beta hyper-prior on the inclusion probabilities. If omitted, alpha will be chosen empirically via LASSO regression.

beta

A positive numeric value, parametrizing the beta hyper-prior on the inclusion probabilities. If omitted, beta will be chosen empirically via LASSO regression.

prior_scale

A numeric value, controlling the scale parameter of the prior slab density. Used as the scale parameter λ when prior = "laplace", or as the standard deviation σ if prior = "gaussian".

update_order

A permutation of 1:ncol(X), giving the update order of the coordinate-ascent algorithm. If omitted, a data driven updating order is used, see Ray and Szabo (2020) in Journal of the American Statistical Association for details.

intercept

A Boolean variable, controlling if an intercept should be included. NB: This feature is still experimental in logistic regression.

noise_sd

A positive numerical value, serving as estimate for the residual noise standard deviation in linear regression. If missing it will be estimated, see estimateSigma from the selectiveInference package for more details. Has no effect when family = "logistic".

max_iter

A positive integer, controlling the maximum number of iterations for the variational update loop.

tol

A small, positive numerical value, controlling the termination criterion for maximum absolute differences between binary entropies of successive iterates.

Details

Suppose θ is the p-dimensional true parameter. The spike-and-slab prior for θ may be represented by the hierarchical scheme

w \sim \mathrm{Beta}(α, β),

z_j \mid w \sim_{i.i.d.} \mathrm{Bernoulli}(w),

θ_j\mid z_j \sim_{ind.} (1-z_j)δ_0 + z_j g.

Here, δ_0 represents the Dirac measure at 0. The slab g may be taken either as a \mathrm{Laplace}(0,λ) or N(0,σ^2) density. The former has centered density

f_λ(x) = \frac{λ}{2} e^{-λ |x|}.

Given α and β, the beta hyper-prior has density

b(x\mid α, β) = \frac{x^{α - 1}(1 - x)^{β - 1}}{\int_0^1 t^{α - 1}(1 - t)^{β - 1}\ \mathrm{d}t}.

A straightforward integration shows that the prior inclusion probability of a coefficient is \frac{α}{α + β}.

Value

The approximate mean-field posterior, given as a named list containing numeric vectors "mu", "sigma", "gamma", and a value "intercept". The latter is set to NA in case intercept = FALSE. In mathematical terms, the conditional distribution of each θ_j is given by

θ_j\mid μ_j, σ_j, γ_j \sim_{ind.} γ_j N(μ_j, σ^2) + (1-γ_j) δ_0.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
### Simulate a linear regression problem of size n times p, with sparsity level s ###

n <- 250
p <- 500
s <- 5

### Generate toy data ###

X <- matrix(rnorm(n*p), n, p) #standard Gaussian design matrix

theta <- numeric(p)
theta[sample.int(p, s)] <- runif(s, -3, 3) #sample non-zero coefficients in random locations

pos_TR <- as.numeric(theta != 0) #true positives

Y <- X %*% theta + rnorm(n) #add standard Gaussian noise

### Run the algorithm in linear mode with Laplace prior and prioritized initialization ###

test <- svb.fit(X, Y, family = "linear")

posterior_mean <- test$mu * test$gamma #approximate posterior mean

pos <- as.numeric(test$gamma > 0.5) #significant coefficients

### Assess the quality of the posterior estimates ###

TPR <- sum(pos[which(pos_TR == 1)])/sum(pos_TR) #True positive rate

FDR <- sum(pos[which(pos_TR != 1)])/max(sum(pos), 1) #False discovery rate

L2 <- sqrt(sum((posterior_mean - theta)^2)) #L_2-error

MSPE <- sqrt(sum((X %*% posterior_mean - Y)^2)/n) #Mean squared prediction error

sparsevb documentation built on Jan. 16, 2021, 5:16 p.m.