EBglmnet | R Documentation |
EBglmnet is the main function to fit a generalized linear model via the empirical Bayesian methods with lasso and elastic net hierarchical priors.
It features with p>>n
capability, produces a sparse outcome for the
regression coefficients, and performs significance test for nonzero effects
in both linear and logistic regression models.
EBglmnet(x, y, family=c("gaussian","binomial"),prior= c("lassoNEG","lasso","elastic net"),
hyperparameters, verbose = 0)
x |
input matrix of dimension |
y |
response variable. Continuous for |
family |
model type taking values of "gaussian" (default) or "binomial". |
prior |
prior distribution to be used. It takes values of "lassoNEG"(default), "lasso", and "elastic net". All priors will produce a sparse outcome of the regression coefficients; see Details for choosing priors. |
hyperparameters |
the optimal hyperparameters in the prior distribution. Similar as |
verbose |
parameter that controls the level of message output from EBglment. It takes values from 0 to 5; larger verbose displays more messages. small values are recommended to avoid excessive outputs. Default value for |
EBglmnet implements three set of hierarchical prior distributions for the regression parameters \beta
:
lasso prior:
\beta_j \sim N(0,\sigma_j^2),
\sigma_j^2 \sim exp(\lambda), j = 1, \dots, p.
lasso-NEG prior:
\beta_j \sim N(0,\sigma_j^2),
\sigma_j^2 \sim exp(\lambda),
\lambda \sim gamma(a,b), j = 1, \dots, p.
elastic net prior:
\beta_j \sim N[0,(\lambda_1 + \tilde{\sigma_j}^{-2})^{-2}],
\tilde{\sigma_j}^{2} \sim generalized-gamma(\lambda_1, \lambda_2), j = 1, \dots,p.
The prior distributions are peak zero and flat tail probability distributions that assign a high prior
probability mass to zero and still allow heavy probability on the two tails, which reflect the prior
belief that a sparse solution exists: most of the variables will have no effects on the response variable,
and only some of the variables will have non-zero effects in contributing the outcome in y
.
The three priors all contains hyperparameters that control how heavy the tail probability is,
and different values of them will yield different number of non-zero effects retained in the model.
Appropriate selection of their values is required for obtaining optimal results,
and CV is the most oftenly used method. See cv.EBglmnet
for details for determining the
optimal hyperparameters in each priors under different GLM families.
lassoNEG prior
"lassoNEG"
prior has two hyperparameters (a,b), with a \ge -1
and b>0
. Although
a
is allowed to be greater than -1.5, it is not encouraged to choose values in (-1.5, -1) unless the signal-to-noise
ratio in the explanatory variables are very small.
lasso prior
"lasso"
prior has one hyperparameter \lambda
, with \lambda \ge 0
. \lambda
is similar as
the shrinkage parameter in lasso
except that even for p>>n
, \lambda
is allowed to be zero, and EBlasso
can still provide a sparse solution thanks to the implicit constraint that \sigma^2 \ge 0
.
elastic net prior
Similar as the elastic net in package glmnet, EBglmnet transforms the two hyperparameters \lambda_1
and \lambda_2
in the "elastic net"
prior in terms of other two parameters \alpha (0\le \alpha \le 1)
and \lambda (\lambda >0)
. Therefore, users are asked to specify hyperparameters=c
(\alpha, \lambda
).
fit |
the model fit using the hyperparameters provided. EBglmnet selects the variables having nonzero regression
coefficients and estimates their posterior distributions. With the posterior mean and variance, a |
WaldScore |
the Wald Score for the posterior distribution. It is computed as |
Intercept |
the intercept in the linear regression model. This parameter is not shrunk. |
residual variance |
the residual variance if the Gaussian family is assumed in the GLM |
logLikelihood |
the log Likelihood if the Binomial family is assumed in the GLM |
hyperparameters |
the hyperparameter used to fit the model |
family |
the GLM family specified in this function call |
prior |
the prior used in this function call |
call |
the call that produced this object |
nobs |
number of observations |
Anhui Huang and Dianting Liu
Cai, X., Huang, A., and Xu, S. (2011). Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping. BMC Bioinformatics 12, 211.
Huang A, Xu S, Cai X. (2013). Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping. BMC genetics 14(1):5.
Huang, A., Xu, S., and Cai, X. (2014a). Empirical Bayesian elastic net for multiple quantitative trait locus mapping. Heredity 10.1038/hdy.2014.79
rm(list = ls())
library(EBglmnet)
#Use R built-in data set state.x77
y= state.x77[,"Life Exp"]
xNames = c("Population","Income","Illiteracy", "Murder","HS Grad","Frost","Area")
x = state.x77[,xNames]
#
#Gaussian Model
#lassoNEG prior as default
out = EBglmnet(x,y,hyperparameters=c(0.5,0.5))
out$fit
#lasso prior
out = EBglmnet(x,y,prior= "lasso",hyperparameters=0.5)
out$fit
#elastic net prior
out = EBglmnet(x,y,prior= "elastic net",hyperparameters=c(0.5,0.5))
out$fit
#residual variance
out$res
#intercept
out$Intercept
#
#Binomial Model
#create a binary response variable
yy = y>mean(y);
out = EBglmnet(x,yy,family="binomial",hyperparameters=c(0.5,0.5))
out$fit
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.