# vbsr: fit a linear model with variational Bayes spike penalty In vbsr: Variational Bayes Spike Regression Regularized Linear Models

## Description

Fit a linear model via a fast coordinate variational Bayes algorithm. Applicable to linear and logistic regression, and solves the problem on either a path of the spike (l0) parameter or at a fixed value based on the data-dimensions.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19``` ```vbsr(y, X, ordering_mat=NULL, eps=1e-6, exclude=NULL, add.intercept=TRUE, maxit = 1e4, n_orderings = 10, family = "normal", scaling = TRUE, return_kl = TRUE, estimation_type = "BMA", bma_approximation = TRUE, screen = 1.0, post=0.95, already_screened = 1.0, kl = 0.99, l0_path=NULL, cleanSolution=FALSE) ```

## Arguments

 `y` response variable. Normally distributed errors for `family="normal"`. For `family="binomial"` should be coded as a vector of 0's and 1's. `X` Design matrix, an n x m matrix, with rows as observations `ordering_mat` Optionally specified coordinate update ordering matrix. Must be in matrix form with columns as permutation vectors of length m, and there must be `n_orderings` columns. `eps` Tolerance used to determine convergence of the algorithm based on the lower bound. `exclude` An optional indicator vector of length m of 0's and 1's indicating whether to penalize a particular variable or not (0=penalize, 1=unpenalized) `add.intercept` A boolean variable indicating whether or not to include an unpenalized intercept variable. `maxit` The maximum number of iterations to run the algorithm for a given solution to a penalized regression problem. `n_orderings` The number of random starts used. `family` The type of error model used. Currently supported modes are `family="normal"` and `family="binomial"` `scaling` A boolean variable indicating whether or not to scale the columns of X to have mean zero and variance one. `return_kl` A boolean variable indicating whether or not to return an analysis of the null distributed features in the data-set as a function of the penalty parameter. `estimation_type` The type of estimation to perform based on the number of unique solution identified to the penalized regression problem. Valid values are `estimation_type="BMA"` and `estimation_type="MAXIMAL"`

.

 `bma_approximation` A boolean variable indicating whether to compute a full correction to the `z` statistic. WARNING can make the algorithm very computationally intensive for highly multi-modal posterior surfaces. `screen` P-value to do marginal screening. Default is to not do marginal prescreening (e.g marginal p-value of 1.0) `post` Choice of penalty parameter such that a feature will have a posterior probability of 0.95 if it passes a Bonferroni correction in the multivariate model. Default is `post=.95`. More conservative approach would be `post=0.5` `already_screened` If features are already screened, the marginal p-value used for screening. `kl` The inner percentiles of the distribution to compute the Kullback-Leibler overfitting statistic. Only works for analysis when directly specifying a path of penalization parameter (e.g. `l0_path`). For default `kl=0.99` the KL-statistic is used for the statistics between the 1%-99% of the distribution. `l0_path` The path of penalty parameters to solve the spike regression problem. If `post` is specified, this is computed automatically. `cleanSolution` This parameter determines whether a given solution is further filtered using an unpenalized model. If `cleanSolution=TRUE`, then the features that are significant after a Bonferroni correction given the p-values from the vbsr regression model are then tested in an unpenalized linear regression model. The p-values and z-statistics are updated using the Wald test from the unpenalized linear regression model for the features that were selected.

## Details

The solutions to the spike penalized regression model are fit with a coordinate variational Bayes algorithm based on the `l0_path` values of the spike hyper-parameter.

## Value

A list with all the results of the vbsr analysis.

 `beta` The expected value of the penalized regression coefficients. `alpha` The estimated value of the unpenalized regression coefficients. `z` The Z-statistic for each penalized regression coefficient `pval` The p-values based on the asymptotic normal assumption of the Z-statistics `post` The posterior probabilities of each of the regression coefficients `l0` The penalty parameters used to solve the penalized regression problem `modelEntropy` The entropy of the identified approximate posterior probability distribution over model space. `modelProb` The approximate posterior probability distribution over the identified model space. `kl_index` If a path solution was run with the KL diagnostic statistic then the points in the path where the KL statistic is nearest the min, the mean, the min + 1 s.e., and the mean +1 s.e. `kl` The KL statistic computed across the path `kl_min` The minimum KL statistic identified along the path `kl_mean` The expected KL statistic given the number of features identified

## Author(s)

Benjamin A. Logsdon

## References

Logsdon, B.A, G.E. Hoffman, and J.G. Mezey (2010) A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis, http://www.biomedcentral.com/1471-2105/11/58, BMC Bioinformatics, Vol. 11(1), 58

Logsdon, B.A., G.E. Hoffman, and J.G. Mezey, (2012). Mouse obesity network reconstruction with a variational Bayes algorithm to employ aggresive false positive control, http://www.biomedcentral.com/1471-2105/13/53/, BMC Bioinformatics, Vol. 13(1), 53

Logsdon, B.A., C.L. Carty, A.P. Reiner, J.Y. Dai, and C. Kooperberg (2012). A novel variational Bayes multiple locus Z-statistic for genome-wide association studies with Bayesian model averaging. Bioinformatics, Vol. 28(13), 1738-1744

`compute_KL`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13``` ``` n <- 100; m <- 500; ntrue <- 10; e <- rnorm(n); X <- matrix(rnorm(n*m),n,m); tbeta <- sample(1:m,ntrue); beta <- rep(0,m); beta[tbeta]<- rnorm(ntrue,0,.3); y <- X%*%beta; y <- y+e; res<- vbsr(y,X,family="normal"); ```