Description Usage Arguments Details Value References See Also Examples
Performs a sparse Bayesian multiple linear regression
of y on X, using the "Sum of Single Effects" model from Wang et al
(2020). In brief, this function fits the regression model y =
μ + X b + e, where elements of e are i.i.d. normal
with zero mean and variance residual_variance
, μ is
an intercept term and b is a vector of length p representing
the effects to be estimated. The “susie assumption” is that
b = ∑_{l=1}^L b_l where each b_l is a vector of
length p with exactly one nonzero element. The prior on the
nonzero element is normal with zero mean and variance var(y)
* scaled_prior_variance
. The value of L
is fixed, and
should be chosen to provide a reasonable upper bound on the number
of nonzero effects to be detected. Typically, the hyperparameters
residual_variance
and scaled_prior_variance
will be
estimated during model fitting, although they can also be fixed as
specified by the user. See functions susie_get_cs
and
other functions of form susie_get_*
to extract the most
commonlyused results from a susie fit.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64  susie(
X,
y,
L = min(10, ncol(X)),
scaled_prior_variance = 0.2,
residual_variance = NULL,
prior_weights = NULL,
null_weight = NULL,
standardize = TRUE,
intercept = TRUE,
estimate_residual_variance = TRUE,
estimate_prior_variance = TRUE,
estimate_prior_method = c("optim", "EM", "simple"),
check_null_threshold = 0,
prior_tol = 1e09,
residual_variance_upperbound = Inf,
s_init = NULL,
coverage = 0.95,
min_abs_corr = 0.5,
compute_univariate_zscore = FALSE,
na.rm = FALSE,
max_iter = 100,
tol = 0.001,
verbose = FALSE,
track_fit = FALSE,
residual_variance_lowerbound = var(drop(y))/10000,
refine = FALSE
)
susie_suff_stat(
bhat,
shat,
R,
n,
var_y,
XtX,
Xty,
yty,
X_colmeans = NA,
y_mean = NA,
maf = NULL,
maf_thresh = 0,
L = 10,
scaled_prior_variance = 0.2,
residual_variance = NULL,
estimate_residual_variance = TRUE,
estimate_prior_variance = TRUE,
estimate_prior_method = c("optim", "EM", "simple"),
check_null_threshold = 0,
prior_tol = 1e09,
r_tol = 1e08,
prior_weights = NULL,
null_weight = NULL,
standardize = TRUE,
max_iter = 100,
s_init = NULL,
coverage = 0.95,
min_abs_corr = 0.5,
tol = 0.001,
verbose = FALSE,
track_fit = FALSE,
check_input = FALSE,
refine = FALSE
)

X 
An n by p matrix of covariates. 
y 
The observed responses, a vector of length n. 
L 
Maximum number of nonzero effects in the susie regression model. If L is larger than the number of covariates, p, L is set to p. 
scaled_prior_variance 
The prior variance, divided by

residual_variance 
Variance of the residual. If

prior_weights 
A vector of length p, in which each entry gives the prior probability that corresponding column of X has a nonzero effect on the outcome, y. 
null_weight 
Prior probability of no effect (a number between 0 and 1, and cannot be exactly 1). 
standardize 
If 
intercept 
If 
estimate_residual_variance 
If

estimate_prior_variance 
If 
estimate_prior_method 
The method used for estimating prior
variance. When 
check_null_threshold 
When the prior variance is estimated,
compare the estimate with the null, and set the prior variance to
zero unless the loglikelihood using the estimate is larger by this
threshold amount. For example, if you set

prior_tol 
When the prior variance is estimated, compare the
estimated value to 
residual_variance_upperbound 
Upper limit on the estimated
residual variance. It is only relevant when

s_init 
A previous susie fit with which to initialize. 
coverage 
A number between 0 and 1 specifying the “coverage” of the estimated confidence sets. 
min_abs_corr 
Minimum absolute correlation allowed in a credible set. The default, 0.5, corresponds to a squared correlation of 0.25, which is a commonly used threshold for genotype data in genetic studies. 
compute_univariate_zscore 
If 
na.rm 
Drop any missing values in y from both X and y. 
max_iter 
Maximum number of IBSS iterations to perform. 
tol 
A small, nonnegative number specifying the convergence
tolerance for the IBSS fitting procedure. The fitting procedure
will halt when the difference in the variational lower bound, or
“ELBO” (the objective function to be maximized), is
less than 
verbose 
If 
track_fit 
If 
residual_variance_lowerbound 
Lower limit on the estimated
residual variance. It is only relevant when

refine 
If 
bhat 
A pvector of estimated effects. 
shat 
A pvector of standard errors. 
R 
A p by p symmetric, positive semidefinite matrix. It
can be X'X, the covariance matrix X'X/(n1), or a
correlation matrix. It should be estimated from the same samples
used to compute 
n 
The sample size. 
var_y 
The sample variance of y, defined as y'y/(n1).
When the sample variance cannot be provided, the coefficients
(returned from 
XtX 
A p by p matrix X'X in which the columns of X are centered to have mean zero. 
Xty 
A pvector X'y in which y and the columns of X are centered to have mean zero. 
yty 
A scalar y'y in which y is centered to have mean zero. 
X_colmeans 
A pvector of column means of 
y_mean 
A scalar containing the mean of 
maf 
Minor allele frequency; to be used along with

maf_thresh 
Variants having a minor allele frequency smaller than this threshold are not used. 
r_tol 
Tolerance level for eigenvalue check of positive semidefinite matrix of R. 
check_input 
If 
The function susie
implements the IBSS algorithm
from Wang et al (2020). The option refine = TRUE
implements
an additional step to help reduce problems caused by convergence of
the IBSS algorithm to poor local optima (which is rare in our
experience, but can provide misleading results when it occurs). The
refinement step incurs additional computational expense that
increases with the number of CSs found in the initial run.
The function susie_suff_stat
implements essentially the same
algorithms, but using sufficient statistics. (The statistics are
sufficient for the regression coefficients b, but not for the
intercept μ; see below for how the intercept is treated.)
If the sufficient statistics are computed correctly then the
results from susie_suff_stat
should be the same as (or very
similar to) susie
, although runtimes will differ as
discussed below. The simplest sufficient statistics are the sample
size n
, and then the p by p matrix X'X, the pvector
X'y, and the sum of squared y values y'y, all computed
after centering the columns of X and the vector y to
have mean 0; these can be computed using compute_suff_stat
.
Alternatively the user can provide n
and bhat
(the
univariate OLS estimates from regressing y on each column of X),
shat
(the standard errrors from these OLS regressions), the
p by p symmetric, positive semidefinite correlation (or covariance)
matrix R = (1/(n1))X'X, and the variance of y, again
all computed from centered X and y. Note that here
R
and bhat
should be computed using the same matrix
X. If you do not have access to the original X to
compute the matrix R
then use susie_rss
.
The handling of the intercept term in susie_suff_stat
needs
some additional explanation. Computing the summary data after
centering X
and y
effectively ensures that the
resulting posterior quantities for b allow for an intercept
in the model; however, the actual value of the intercept cannot be
estimated from these centered data. To estimate the intercept term
the user must also provide the column means of X and the mean
of y (X_colmeans
and y_mean
). If these are not
provided, they are treated as NA
, which results in the
intercept being NA
. If for some reason you prefer to have
the intercept be 0 instead of NA
then set
X_colmeans = 0,y_mean = 0
.
For completeness, we note that if susie_suff_stat
is run on
X'X, X'y, y'y computed without centering X and
y, and with X_colmeans = 0,y_mean = 0
, this is
equivalent to susie
applied to X, y with
intercept = FALSE
(although results may differ due to
different initializations of residual_variance
and
scaled_prior_variance
). However, this usage is not
recommended for for most situations.
The computational complexity of susie
is O(npL) per
iteration, whereas susie_suff_stat
is O(p^2L) per
iteration (not including the cost of computing the sufficient
statistics, which is dominated by the O(np^2) cost of
computing X'X). Because of the cost of computing X'X,
susie
will usually be faster. However, if n >> p,
and/or if X'X is already computed, then
susie_suff_stat
may be faster.
A "susie"
object with some or all of the following
elements:
alpha 
An L by p matrix of posterior inclusion probabilites. 
mu 
An L by p matrix of posterior means, conditional on inclusion. 
mu2 
An L by p matrix of posterior second moments, conditional on inclusion. 
Xr 
A vector of length n, equal to 
lbf 
logBayes Factor for each single effect. 
lbf_variable 
logBayes Factor for each variable and single effect. 
intercept 
Intercept (fixed or estimated). 
sigma2 
Residual variance (fixed or estimated). 
V 
Prior variance of the nonzero elements of b, equal to

elbo 
The value of the variational lower bound, or “ELBO” (objective function to be maximized), achieved at each iteration of the IBSS fitting procedure. 
fitted 
Vector of length n containing the fitted values of the outcome. 
sets 
Credible sets estimated from model fit; see

pip 
A vector of length p giving the (marginal) posterior inclusion probabilities for all p covariates. 
z 
A vector of univariate zscores. 
niter 
Number of IBSS iterations that were performed. 
converged 

susie_suff_stat
returns also outputs:
XtXr 
A pvector of 
G. Wang, A. Sarkar, P. Carbonetto and M. Stephens (2020). A simple new approach to variable selection in regression, with application to genetic finemapping. Journal of the Royal Statistical Society, Series B 82, 12731300 doi: 10.1101/501114.
susie_get_cs
and other susie_get_*
functions for extracting results; susie_trendfilter
for
applying the SuSiE model to nonparametric regression, particularly
changepoint problems, and susie_rss
for applying the
SuSiE model when one only has access to limited summary statistics
related to X and y (typically in genetic applications).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23  # susie example
set.seed(1)
n = 1000
p = 1000
beta = rep(0,p)
beta[1:4] = 1
X = matrix(rnorm(n*p),nrow = n,ncol = p)
X = scale(X,center = TRUE,scale = TRUE)
y = drop(X %*% beta + rnorm(n))
res1 = susie(X,y,L = 10)
susie_get_cs(res1) # extract credible sets from fit
plot(beta,coef(res1)[1])
abline(a = 0,b = 1,col = "skyblue",lty = "dashed")
plot(y,predict(res1))
abline(a = 0,b = 1,col = "skyblue",lty = "dashed")
# susie_suff_stat example
input_ss = compute_suff_stat(X,y)
res2 = with(input_ss,
susie_suff_stat(XtX = XtX,Xty = Xty,yty = yty,n = n,
X_colmeans = X_colmeans,y_mean = y_mean,L = 10))
plot(coef(res1),coef(res2))
abline(a = 0,b = 1,col = "skyblue",lty = "dashed")

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.