bglm: Bayesian Generalized Linear Models (GLMs)

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

This function is to set up Bayesian hierarchical GLMs, and to fit the model using the EM-IWLS algorithm. Four types of priors on each coefficient can be used: Student-t, double-exponential, spike-and-slab mixture Student-t, and spike-and-slab mixture double-exponential. The Bayesian hierarchical GLMs include various models as special cases, e.g., classical GLMs, ridge regression, and Bayesian lasso. It can be used for analyzing general data and large-scale and highly-correlated variables (for example, detecting disease-associated factors and predicting phenotypes).

Usage

1
2
3
4
5
    
bglm(formula, family = gaussian, data, offset, weights, subset, na.action, 
    start = NULL, etastart, mustart, control = glm.control(epsilon = 1e-04, maxit = 50), 
    prior = Student(), group = NULL, method.coef,
    Warning = FALSE, verbose = FALSE)  

Arguments

formula, family, data, offset, weights, subset, na.action, start, etastart, mustart, control

These arguments are the same as in glm.

family

can be all the standard families defined in glm. also can be Negative Binomial (NegBin or "NegBin").

prior

Prior distribustions for the coefficents. Four types of priors can be used; Student-t: Student(mean, scale, df, autoscale) (default: mean=0, scale=0.5, df=1, autoscale=TRUE), Double-exponetial: De(mean, scale, autoscale) (default: mean=0, scale=0.5, autoscale=TRUE), mixture double-exponential: mde(mean, s0, s1, b), and mixture Student-t: mt(mean, s0, s1, df, b), s0 < s1, default: mean=0, s0=0.04, s1=0.5, df=1, b=1. The mean, scale, df and b can be a vector. For example, scale = c(a1,a2,...,ak); if k < the total number of predictors, it is internally expanded to c(a1,a2,...,ak, rep(ak,J-k)). If autoscale=TRUE, scale will be modified internally (see details).

group

a numeric vector, or an integer, or a list definining the groups of predictors. Only used for mde or mt priors. If group = NULL, all the predictors form a single group. If group = K, the predictors are evenly divided into groups each with K predictors. If group is a numberic vector, it defines groups as follows: Group 1: (group[1]+1):group[2], Group 2: (group[2]+1):group[3], Group 3: (group[3]+1):group[4], ..... If group is a list of variable names, group[[k]] includes variables in the k-th group.

method.coef

jointly updating all coefficients or updating coefficients group by group. The default is jointly updating. If method.coef = NULL or method.coef is missing, jointly updating. If method.coef = K, update K coefficients at a time. method.coef can be a numeric vector or a list of variable names (as defined by group) that defines groups. If the number of coefficients is large, the group-by-group updating method can be much faster than the jointly updating.

Warning

logical. If TRUE, show the error messages of not convergence and identifiability.

verbose

logical. If TRUE, print out number of iterations and computational time.

Details

This function sets up Bayesian hierarchical GLMs and fits the model using the EM-IWLS algorithm. It is an alteration of the standard function glm for classical GLMs, and includes all the glm arguments and also some new arguments for Bayesian modeling.

For the priors Student and De, if autoscale=TRUE, scale is modified internally: for a predictor with only one value nothing is changed; for a predictor x with exactly two unique values, we divide the user-specified scale scale by the range of x; for a predictor x with more than two unique values, we divide the user-specified scale by sd(x). For gaussian models, scale is further multiplied by sd(y).

The argument group is only used for the prior mde or mt, allowing for group-specific inclusion probabilities and thus incorporating group information (e.g. biological pathways). With the mixture prior, the predictors can be grouped or ungrouped. For ungrouped predictors, the prior is double-exponential or Student-t with scale s1.

The prior mde(mean, s0, s1, b) represents a mixture of De(mean, s0) and De(mean, s1) with group-specific inclusion probabilities following beta(1,b). The tuning parameter b can be a vector of group-specific values.

Value

This function returns an object of class "glm", including all outputs from the function glm, and also results for the additional parameters in the hierarchical models.

Author(s)

Nengjun Yi, nyi@uab.edu

References

Yi, N. and Banerjee, S. (2009). Hierarchical generalized linear models for multiple quantitative trait locus mapping. Genetics 181, 1101-1113.

Yi, N., Kaklamani, V. G. and Pasche, B. (2011). Bayesian analysis of genetic interactions in case-control studies, with application to adiponectin genes and colorectal cancer risk. Ann Hum Genet 75, 90-104.

Yi, N. and Ma, S. (2012). Hierarchical Shrinkage Priors and Model Fitting Algorithms for High-dimensional Generalized Linear Models. Statistical Applications in Genetics and Molecular Biology 11 (6), 1544-6115.

Gelman, A., Jakulin, A., Pittau, M. G. and Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics 2, 1360-1383.

Rockova, V. and George, E. I. (2014) EMVS: The EM Approach to Bayesian Variable Selection. JASA 109: 828-846.

Zaixiang Tang, Yueping Shen, Xinyan Zhang, Nengjun Yi (2017) The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection. Genetics 205, 77–88.

Zaixiang Tang, Yueping Shen, Yan Li, Xinyan Zhang, Jia Wen, Chen'ao Qian, Wenzhuo Zhuang, Xinghua Shi, and Nengjun Yi (2018) Group Spike-and-Slab Lasso Generalized Linear Models for Disease Prediction and Associated Genes Detection by Incorporating Pathway Information. Bioinformatics 34(6): 901-910.

See Also

glm, glm.nb

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
library(BhGLM)

N = 1000
K = 100
x = sim.x(n=N, m=K, corr=0.6) # simulate correlated continuous variables  
h = rep(0.1, 4) # assign four non-zero main effects to have the assumed heritabilty 
nz = as.integer(seq(5, K, by=K/length(h))); nz
yy = sim.y(x=x[, nz], mu=0, herit=h, p.neg=0.5, sigma=1.6, theta=2) # simulate responses
yy$coefs


# y = yy$y.normal; fam = gaussian; y = scale(y)
# y = yy$y.ordinal; fam = binomial
y = yy$y.nb; fam = NegBin

# jointly fit all variables (can be slow if m is large)
par(mfrow = c(2, 2), cex.axis = 1, mar = c(3, 4, 4, 4))
gap = 10

ps = 0.05
f1 = bglm(y ~ ., data = x, family = fam, prior = De(0, ps))   
plot.bh(f1, vars.rm = 1, threshold = 0.01, gap = gap, main = "de")  

f2 = bglm(y ~ ., data = x, family = fam, prior = Student(0, ps/1.4))   
plot.bh(f2, vars.rm = 1, threshold = 0.01, gap = gap, main = "t")  

ss = c(0.04, 0.5) 
f3 = bglm(y ~ ., data = x, family = fam, prior = mde(0, ss[1], ss[2]))   
plot.bh(f3, vars.rm = 1, threshold = 0.01, gap = gap, main = "mde")  

# group-wise update (can be much faster if m is large)
ps = 0.05
f1 = bglm(y ~ ., data = x, family = fam, prior = De(0, ps), method.coef = 50) # update 50 coefficients at a time
plot.bh(f1, vars.rm = 1, threshold = 0.01, gap = gap)  

nyiuab/BhGLM documentation built on Jan. 9, 2022, 3:31 p.m.