sbr: Function sbr

Description Usage Arguments Value Author(s) References See Also Examples

Description

Function for scalable Bayesian regression (SBR) and sparse SBR (SSBR) in normal linear models with multiple types (sources) of feature matrices (with K being the number of sources). When K = 1, SBR corresponds to standard ridge regression using one from the three available empirical Bayes estimators (see below) for the penalty parameter. For details see Perrakis, Mukherjee and the Alzheimers Disease Neuroimaging Initiative. (2018).

Usage

1
2
3
4
sbr(y, X, trX, G, estimator = "MAP", sparsify = FALSE,
  relaxed = TRUE, sparse.control = 1, p.threshold = 5000,
  cov.blocks = 1000, parallel = FALSE, cl, L.optim = 1e-04,
  U.optim = 10000)

Arguments

y

a standardized response vector.

X

a standardized feature matrix (if K = 1) or a list of standardized feature matrices (if K > 1).

trX

(optional) the transpose matrix of X (if K = 1) or a list of transpose matrices (if K > 1).

G

the inner-product Gram matrix (if K = 1) or a list containing the multiple Gram matrices (if K > 1).

estimator

the estimator used for tuning the shrinkage levels. Available estimates are leave-one-out cross-validation ("CV"), maximum marginal likelihood ("ML") and the maximum-a-posteriori value ("MAP", default).

sparsify

logical, if TRUE the SSBR solution is calculated, default option is FALSE.

relaxed

logical, if TRUE (default) the relaxed SSBR solution is calculated, if FALSE the general SSBR solution is calculated.

sparse.control

numerical value for controlling the effect of sample size (n) on the resulting SSBR solution when relaxed = TRUE. Default option is 1 (no control). A recommended option for sparser solutions is sparse.control = log(n).

p.threshold

used for block-matrix computation of the main diagonal of the covariance matrix when sparsify = TRUE and relaxed = TRUE. It will be triggered for any source-matrix whose number of columns is larger than p.threshold.

cov.blocks

argument corresponding to block size (not the number of blocks) when the block-matrix computation is triggered (see above). Default option is 1000, i.e. blocks of dimensionality 1000 x 1000.

parallel

logical, if parallel = TRUE the calculation of variance components required for the relaxed SSBR solution is performed in parallel. Default is FALSE.

cl

the number of cores to use when parallel = TRUE. Must be provided by the user.

L.optim

lower bound for the optimization procedure used to tune the shrinkage levels, default is 1e-04.

U.optim

upper bound for the optimization procedure used to tune the shrinkage levels, default is 1e04.

Value

An object with S3 class 'sbr' allowing for call to generic functions coef and predict

An object of class 'sbr' is a list containing the following components:

coefficients

a 1-column matrix with the SBR beta estimates (when sparsify = FALSE) or a 2-column matrix with the SBR and SSBR beta estimates (when sparsify = TRUE). Note that the coefficients correspond to the standardized response variable and feature matrix.

sigma2

the variance component (at the posterior mode).

lambda

the vector of penalty parameters.

lambdaEstimator

the estimation method for lambda.

duration

reported runtime

Author(s)

Konstanstinos Perrakis konstantinos.perrakis@dzne.de

Sach Mukherjee sach.mukherjee@dzne.de

References

Perrakis, K., Mukherjee, S. and the Alzheimers Disease Neuroimaging Initiative. (2018) Scalable Bayesian regression in high dimensions with multiple data sources. https://arxiv.org/pdf/1710.00596.pdf

See Also

gram

gram.parallel

predict.sbr

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
################# Example with 3 data sources #################

library(mvtnorm)
library(MCMCpack)

### GENERATION OF DATA ###

## sample size and number of predictors
n <- 50 
p1 <- 10
p2 <- 100
p3 <- 300

## generation of covariance matrices and feature matrices
S1 <- riwish(p1, diag(p1))
S2 <- riwish(p2, diag(p2))
S3 <- riwish(p3, diag(p3))
X1 <- matrix(rmvnorm(n * p1, rep(0, p1), S1), n, p1)
X2 <- matrix(rmvnorm(n * p2, rep(0 ,p2), S2), n, p2)
X3 <- matrix(rmvnorm(n * p3, rep(0, p3), S3), n, p3)

## sparsity and generation of betas
s2 <- p2 * 0.3
s3 <- p3 * 0.01
non.zero2 <- sample(1:p2, s2)
non.zero3 <- sample(1:p3, s3)
b1 <- rnorm(10, 0, 2.5)
b2 <- rep(0, p2)
b2[non.zero2] <- rnorm(s2)
b3 <- rep(0, p3)
b3[non.zero3] <- rnorm(s3)

mu <- X1 %*% b1 + X2 %*% b2 + X3 %*% b3
y <- rnorm(n, mu, sd=0.5)

## standardize
y <- scale(y)
X1 <- scale(X1)
X2 <- scale(X2)
X3 <- scale(X3)

G1 <- X1 %*% t(X1); G2 <- X2 %*% t(X2); G3 <- X3 %*% t(X3)

## make lists
G <- list(G1, G2, G3)
X <- list(X1, X2, X3)

### RUN SBR/SSBR ###

# 1) SBR with the ML lambda-estimator

model1 <- sbr(y = y, X = X,G = G, estimator = 'ML') 

# 2) relaxed SSBR with the ML lambda-estimator using block-matrix computations for the variances of X3 (since p3=300)

model2 <- sbr(y = y, X = X,G = G, estimator = 'ML', sparsify = TRUE, p.threshold = 100, cov.blocks = 100) 

# 3) SSBR with the ML lambda-estimator

model3 <- sbr(y = y, X = X, G = G, estimator = 'ML', sparsify = TRUE, relaxed = FALSE)

# 4) parallel computing for the configuration of model2

cores <- detectCores() - 1
cores <- makeCluster(cores)
registerDoParallel(cores)
model4 <- sbr(y = y, X = X, G = G, estimator = 'ML', parallel = TRUE, cl = cores, sparsify = TRUE, p.threshold = 100, cov.blocks = 100)
stopCluster(cores)

### EXTRACTING OUTPUT FROM A MODEL ###

coef(model3)    # SBR/SSBR coefficients (or alternatively model3$coeffients)
model3$lambda   # vector of lambdas  
model3$sigma2   # error variance  
model3$duration # runtime

kperrakis/sbr documentation built on May 21, 2019, 10:48 a.m.