EBglmnet: Main Function for the EBglmnet Algorithms

Description Usage Arguments Details Value Author(s) References Examples

Description

EBglmnet is the main function to fit a generalized linear model via the empirical Bayesian methods with lasso and elastic net hierarchical priors. It features with p>>n capability, produces a sparse outcome for the regression coefficients, and performs significance test for nonzero effects in both linear and logistic regression models.

Usage

1
2
EBglmnet(x, y, family=c("gaussian","binomial"),prior= c("lassoNEG","lasso","elastic net"),
	hyperparameters,Epis = FALSE,group = FALSE, verbose = 0)

Arguments

x

input matrix of dimension n x p; each row is an observation vector, and each column is a variable. When epistasis is considered, users do not need to create a giant matrix including both main and interaction terms. Instead, x should always be the matrix corresponding to the p main effects, and EBglmnet will generate the interaction terms dynamically during running time.

y

response variable. Continuous for family="gaussian", and binary for family="binomial". For binary response variable, y can be a Boolean or numeric vector, or factor type array.

family

model type taking values of "gaussian" (default) or "binomial".

prior

prior distribution to be used. It takes values of "lassoNEG"(default), "lasso", and "elastic net". All priors will produce a sparse outcome of the regression coefficients; see Details for choosing priors.

hyperparameters

the optimal hyperparameters in the prior distribution. Similar as λ in lasso method, the hyperparameters control the number of nonzero elements in the regression coefficients. Hyperparameters are most oftenly determined by CV. See cv.EBglmnet for the method in determining their values. While cv.EBglmnet already provides the model fitting results using the hyperparameters determined in CV, users can use this function to obtain the results under other parameter selection criteria such as Akaike information criterion (AIC) or Bayesian information criterion (BIC).

Epis

Boolean parameter for including two-way interactions. By default, Epis = FALSE. When Epis = TRUE, EBglmnet will take all pair-wise interaction effects into consideration. EBglmnet does not create a giant matrix for all the p(p+1)/2 effects. Instead, it dynamically allocates memory for the nonzero effects identified in the model, and reads the corresponding variables from the original input matrix x

group

Boolean parameter for group EBlasso (currently only available for the "lassoNEG" prior). This parameter is only valid when Epis = TRUE, and is set to FALSE by default. When Epis = TRUE and group = TRUE, the hyperparameter controlling degree of shrinkage will be further scaled such that the scale hyperparameter for interaction terms is different with that of main effects by a factor of √{p(p-1)/2}. When p is large, eg., several thousands of genetic markers, the total number of effects can easily be more than 10 millions, and group EBlasso helps to reduce the interference of spurious correlation and noise accumulation.

verbose

parameter that controls the level of message output from EBglment. It takes values from 0 to 5; larger verbose displays more messages. small values are recommended to avoid excessive outputs. Default value for verbose is minimum message output.

Details

EBglmnet implements three set of hierarchical prior distributions for the regression parameters β:

lasso prior:

β_j \sim N(0,σ_j^2),

σ_j^2 \sim exp(λ), j = 1, …, p.

lasso-NEG prior:

β_j \sim N(0,σ_j^2),

σ_j^2 \sim exp(λ),

λ \sim gamma(a,b), j = 1, …, p.

elastic net prior:

β_j \sim N[0,(λ_1 + \tilde{σ_j}^{-2})^{-2}],

\tilde{σ_j}^{2} \sim generalized-gamma(λ_1, λ_2), j = 1, …,p.

The prior distributions are peak zero and flat tail probability distributions that assign a high prior probability mass to zero and still allow heavy probability on the two tails, which reflect the prior belief that a sparse solution exists: most of the variables will have no effects on the response variable, and only some of the variables will have non-zero effects in contributing the outcome in y.

The three priors all contains hyperparameters that control how heavy the tail probability is, and different values of them will yield different number of non-zero effects retained in the model. Appropriate selection of their values is required for obtaining optimal results, and CV is the most oftenly used method. See cv.EBglmnet for details for determining the optimal hyperparameters in each priors under different GLM families.

lassoNEG prior
"lassoNEG" prior has two hyperparameters (a,b), with a ≥ -1 and b>0. Although a is allowed to be greater than -1.5, it is not encouraged to choose values in (-1.5, -1) unless the signal-to-noise ratio in the explanatory variables are very small.

lasso prior
"lasso" prior has one hyperparameter λ, with λ ≥ 0. λ is similar as the shrinkage parameter in lasso except that even for p>>n, λ is allowed to be zero, and EBlasso can still provide a sparse solution thanks to the implicit constraint that σ^2 ≥ 0.

elastic net prior
Similar as the elastic net in package glmnet, EBglmnet transforms the two hyperparameters λ_1 and λ_2 in the "elastic net" prior in terms of other two parameters α (0≤ α ≤ 1) and λ (λ >0). Therefore, users are asked to specify hyperparameters=c(α, λ).

Value

fit

the model fit using the hyperparameters provided. EBglmnet selects the variables having nonzero regression coefficients and estimates their posterior distributions. With the posterior mean and variance, a t-test is performed and the p-value is calculated. Result in fit is a matrix with rows corresponding to the variables having nonzero effects, and columns having the following values:

column1-2: (locus1, locus2) denoting the column number in the input matrix x. When locus1 equals to locus2, this effect is from one of the p main effects, otherwise, it is the interaction effect between x[,locus1] and x[,locus2]. When Epis = FALSE, which is the default setting, locus1 always equals locus2. If Epis =TRUE, fit always puts the main effects in the beginning, and epistatic effects after that.

column3: beta. It is the posterior mean of the nonzero regression coefficients.

column4: posterior variance. It is the diagonal element of the posterior covariance matrix among the nonzero regression coefficients.

column5: t-value calculated using column 3-4.

column6: p-value from t-test.

WaldScore

the Wald Score for the posterior distribution. It is computed as β^TΣ^{-1}β. See (Huang A, 2014b) for using Wald Score to identify significant effect set.

Intercept

the intercept in the linear regression model. This parameter is not shrunk.

residual variance

the residual variance if the Gaussian family is assumed in the GLM

logLikelihood

the log Likelihood if the Binomial family is assumed in the GLM

hyperparameters

the hyperparameter used to fit the model

family

the GLM family specified in this function call

prior

the prior used in this function call

call

the call that produced this object

nobs

number of observations

Author(s)

Anhui Huang and Dianting Liu
Dept of Electrical and Computer Engineering, Univ of Miami, Coral Gables, FL

References

Cai, X., Huang, A., and Xu, S. (2011). Fast empirical Bayesian LASSO for multiple quantitative trait locus mapping. BMC Bioinformatics 12, 211.

Huang A, Xu S, Cai X. (2013). Empirical Bayesian LASSO-logistic regression for multiple binary trait locus mapping. BMC genetics 14(1):5.

Huang, A., Xu, S., and Cai, X. (2014a). Empirical Bayesian elastic net for multiple quantitative trait locus mapping. Heredity 10.1038/hdy.2014.79

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
rm(list = ls())
library(EBglmnet)
#Use R built-in data set state.x77
y= state.x77[,"Life Exp"]
xNames = c("Population","Income","Illiteracy", "Murder","HS Grad","Frost","Area")
x = state.x77[,xNames]
#
#Gaussian Model
#lassoNEG prior as default
out = EBglmnet(x,y,hyperparameters=c(0.5,0.5))
out$fit
#lasso prior
out = EBglmnet(x,y,prior= "lasso",hyperparameters=0.5)
out$fit
#elastic net prior
out = EBglmnet(x,y,prior= "elastic net",hyperparameters=c(0.5,0.5))
out$fit
#residual variance
out$res
#intercept
out$Intercept
#
#Binomial Model
#create a binary response variable
yy = y>mean(y);
out = EBglmnet(x,yy,family="binomial",hyperparameters=c(0.5,0.5))
out$fit
#with epistatic effects
out = EBglmnet(x,yy,family="binomial",hyperparameters=c(0.5,0.5),Epis =TRUE)
out$fit

EBglmnet documentation built on May 2, 2019, 2:46 a.m.