Description Usage Arguments Value Author(s) References See Also Examples
Function for scalable Bayesian regression (SBR) and sparse SBR (SSBR) in normal linear models with multiple types (sources) of feature matrices (with K being the number of sources). When K = 1, SBR corresponds to standard ridge regression using one from the three available empirical Bayes estimators (see below) for the penalty parameter. For details see Perrakis, Mukherjee and the Alzheimers Disease Neuroimaging Initiative. (2018).
1 2 3 4 |
y |
a standardized response vector. |
X |
a standardized feature matrix (if K = 1) or a list of standardized feature matrices (if K > 1). |
trX |
(optional) the transpose matrix of X (if K = 1) or a list of transpose matrices (if K > 1). |
G |
the inner-product Gram matrix (if K = 1) or a list containing the multiple Gram matrices (if K > 1). |
estimator |
the estimator used for tuning the shrinkage levels. Available estimates are leave-one-out cross-validation ("CV"), maximum marginal likelihood ("ML") and the maximum-a-posteriori value ("MAP", default). |
sparsify |
logical, if TRUE the SSBR solution is calculated, default option is FALSE. |
relaxed |
logical, if TRUE (default) the relaxed SSBR solution is calculated, if FALSE the general SSBR solution is calculated. |
sparse.control |
numerical value for controlling the effect of sample size (n) on the resulting SSBR solution when relaxed = TRUE. Default option is 1 (no control). A recommended option for sparser solutions is sparse.control = log(n). |
p.threshold |
used for block-matrix computation of the main diagonal of the covariance matrix when sparsify = TRUE and relaxed = TRUE. It will be triggered for any source-matrix whose number of columns is larger than p.threshold. |
cov.blocks |
argument corresponding to block size (not the number of blocks) when the block-matrix computation is triggered (see above). Default option is 1000, i.e. blocks of dimensionality 1000 x 1000. |
parallel |
logical, if parallel = TRUE the calculation of variance components required for the relaxed SSBR solution is performed in parallel. Default is FALSE. |
cl |
the number of cores to use when parallel = TRUE. Must be provided by the user. |
L.optim |
lower bound for the optimization procedure used to tune the shrinkage levels, default is 1e-04. |
U.optim |
upper bound for the optimization procedure used to tune the shrinkage levels, default is 1e04. |
An object with S3 class 'sbr' allowing for call to generic functions coef
and predict
An object of class 'sbr' is a list containing the following components:
coefficients |
a 1-column matrix with the SBR beta estimates (when sparsify = FALSE) or a 2-column matrix with the SBR and SSBR beta estimates (when sparsify = TRUE). Note that the coefficients correspond to the standardized response variable and feature matrix. |
sigma2 |
the variance component (at the posterior mode). |
lambda |
the vector of penalty parameters. |
lambdaEstimator |
the estimation method for lambda. |
duration |
reported runtime |
Konstanstinos Perrakis konstantinos.perrakis@dzne.de
Sach Mukherjee sach.mukherjee@dzne.de
Perrakis, K., Mukherjee, S. and the Alzheimers Disease Neuroimaging Initiative. (2018) Scalable Bayesian regression in high dimensions with multiple data sources. https://arxiv.org/pdf/1710.00596.pdf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | ################# Example with 3 data sources #################
library(mvtnorm)
library(MCMCpack)
### GENERATION OF DATA ###
## sample size and number of predictors
n <- 50
p1 <- 10
p2 <- 100
p3 <- 300
## generation of covariance matrices and feature matrices
S1 <- riwish(p1, diag(p1))
S2 <- riwish(p2, diag(p2))
S3 <- riwish(p3, diag(p3))
X1 <- matrix(rmvnorm(n * p1, rep(0, p1), S1), n, p1)
X2 <- matrix(rmvnorm(n * p2, rep(0 ,p2), S2), n, p2)
X3 <- matrix(rmvnorm(n * p3, rep(0, p3), S3), n, p3)
## sparsity and generation of betas
s2 <- p2 * 0.3
s3 <- p3 * 0.01
non.zero2 <- sample(1:p2, s2)
non.zero3 <- sample(1:p3, s3)
b1 <- rnorm(10, 0, 2.5)
b2 <- rep(0, p2)
b2[non.zero2] <- rnorm(s2)
b3 <- rep(0, p3)
b3[non.zero3] <- rnorm(s3)
mu <- X1 %*% b1 + X2 %*% b2 + X3 %*% b3
y <- rnorm(n, mu, sd=0.5)
## standardize
y <- scale(y)
X1 <- scale(X1)
X2 <- scale(X2)
X3 <- scale(X3)
G1 <- X1 %*% t(X1); G2 <- X2 %*% t(X2); G3 <- X3 %*% t(X3)
## make lists
G <- list(G1, G2, G3)
X <- list(X1, X2, X3)
### RUN SBR/SSBR ###
# 1) SBR with the ML lambda-estimator
model1 <- sbr(y = y, X = X,G = G, estimator = 'ML')
# 2) relaxed SSBR with the ML lambda-estimator using block-matrix computations for the variances of X3 (since p3=300)
model2 <- sbr(y = y, X = X,G = G, estimator = 'ML', sparsify = TRUE, p.threshold = 100, cov.blocks = 100)
# 3) SSBR with the ML lambda-estimator
model3 <- sbr(y = y, X = X, G = G, estimator = 'ML', sparsify = TRUE, relaxed = FALSE)
# 4) parallel computing for the configuration of model2
cores <- detectCores() - 1
cores <- makeCluster(cores)
registerDoParallel(cores)
model4 <- sbr(y = y, X = X, G = G, estimator = 'ML', parallel = TRUE, cl = cores, sparsify = TRUE, p.threshold = 100, cov.blocks = 100)
stopCluster(cores)
### EXTRACTING OUTPUT FROM A MODEL ###
coef(model3) # SBR/SSBR coefficients (or alternatively model3$coeffients)
model3$lambda # vector of lambdas
model3$sigma2 # error variance
model3$duration # runtime
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.