fit a linear model with variational Bayes spike penalty
Description
Fit a linear model via a fast coordinate variational Bayes algorithm. Applicable to linear and logistic regression, and solves the problem on either a path of the spike (l0) parameter or at a fixed value based on the datadimensions.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19  vbsr(y,
X,
ordering_mat=NULL,
eps=1e6,
exclude=NULL,
add.intercept=TRUE,
maxit = 1e4,
n_orderings = 10,
family = "normal",
scaling = TRUE,
return_kl = TRUE,
estimation_type = "BMA",
bma_approximation = TRUE,
screen = 1.0,
post=0.95,
already_screened = 1.0,
kl = 0.99,
l0_path=NULL,
cleanSolution=FALSE)

Arguments
y 
response variable. Normally distributed errors for 
X 
Design matrix, an n x m matrix, with rows as observations 
ordering_mat 
Optionally specified coordinate update ordering matrix. Must be
in matrix form with columns as permutation vectors of length m, and there must be

eps 
Tolerance used to determine convergence of the algorithm based on the lower bound. 
exclude 
An optional indicator vector of length m of 0's and 1's indicating whether to penalize a particular variable or not (0=penalize, 1=unpenalized) 
add.intercept 
A boolean variable indicating whether or not to include an unpenalized intercept variable. 
maxit 
The maximum number of iterations to run the algorithm for a given solution to a penalized regression problem. 
n_orderings 
The number of random starts used. 
family 
The type of error model used. Currently supported modes are 
scaling 
A boolean variable indicating whether or not to scale the columns of X to have mean zero and variance one. 
return_kl 
A boolean variable indicating whether or not to return an analysis of the null distributed features in the dataset as a function of the penalty parameter. 
estimation_type 
The type of estimation to perform based on the number of unique solution identified
to the penalized regression problem. Valid values are 
.
bma_approximation 
A boolean variable indicating whether to compute a full correction to the

screen 
Pvalue to do marginal screening. Default is to not do marginal prescreening (e.g marginal pvalue of 1.0) 
post 
Choice of penalty parameter such that a feature will have a posterior probability of 0.95 if it passes a Bonferroni correction in the multivariate model. Default is 
already_screened 
If features are already screened, the marginal pvalue used for screening. 
kl 
The inner percentiles of the distribution to compute the KullbackLeibler overfitting statistic. Only works for analysis when directly specifying a path of penalization parameter (e.g. 
l0_path 
The path of penalty parameters to solve the spike regression problem. If 
cleanSolution 
This parameter determines whether a given solution is further filtered using an unpenalized model. If 
Details
The solutions to the spike penalized regression model are fit with a
coordinate variational Bayes algorithm based on the l0_path
values
of the spike hyperparameter.
Value
A list with all the results of the vbsr analysis.
beta 
The expected value of the penalized regression coefficients. 
alpha 
The estimated value of the unpenalized regression coefficients. 
z 
The Zstatistic for each penalized regression coefficient 
pval 
The pvalues based on the asymptotic normal assumption of the Zstatistics 
post 
The posterior probabilities of each of the regression coefficients 
l0 
The penalty parameters used to solve the penalized regression problem 
modelEntropy 
The entropy of the identified approximate posterior probability distribution over model space. 
modelProb 
The approximate posterior probability distribution over the identified model space. 
kl_index 
If a path solution was run with the KL diagnostic statistic then the points in the path where the KL statistic is nearest the min, the mean, the min + 1 s.e., and the mean +1 s.e. 
kl 
The KL statistic computed across the path 
kl_min 
The minimum KL statistic identified along the path 
kl_mean 
The expected KL statistic given the number of features identified 
Author(s)
Benjamin A. Logsdon
References
Logsdon, B.A, G.E. Hoffman, and J.G. Mezey (2010) A variational Bayes algorithm for fast and accurate multiple locus genomewide association analysis, http://www.biomedcentral.com/14712105/11/58, BMC Bioinformatics, Vol. 11(1), 58
Logsdon, B.A., G.E. Hoffman, and J.G. Mezey, (2012). Mouse obesity network reconstruction with a variational Bayes algorithm to employ aggresive false positive control, http://www.biomedcentral.com/14712105/13/53/, BMC Bioinformatics, Vol. 13(1), 53
Logsdon, B.A., C.L. Carty, A.P. Reiner, J.Y. Dai, and C. Kooperberg (2012). A novel variational Bayes multiple locus Zstatistic for genomewide association studies with Bayesian model averaging. Bioinformatics, Vol. 28(13), 17381744
See Also
compute_KL
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 