bglm | R Documentation |
This function is to set up Bayesian hierarchical GLMs, and to fit the model using the EM-IWLS algorithm. Four types of priors on each coefficient can be used: Student-t, double-exponential, spike-and-slab mixture Student-t, and spike-and-slab mixture double-exponential. The Bayesian hierarchical GLMs include various models as special cases, e.g., classical GLMs, ridge regression, and Bayesian lasso. It can be used for analyzing general data and large-scale and highly-correlated variables (for example, detecting disease-associated factors and predicting phenotypes).
bglm(formula, family = gaussian, data, offset, weights, subset, na.action,
start = NULL, etastart, mustart, control = glm.control(epsilon = 1e-04, maxit = 50),
prior = Student(), group = NULL, method.coef,
Warning = FALSE, verbose = FALSE)
formula , family , data , offset , weights , subset , na.action , start , etastart , mustart , control |
These arguments are the same as in |
family |
can be all the standard families defined in |
prior |
Prior distribustions for the coefficents. Four types of priors can be used; Student-t: |
group |
a numeric vector, or an integer, or a list definining the groups of predictors. Only used for |
method.coef |
jointly updating all coefficients or updating coefficients group by group. The default is jointly updating.
If |
Warning |
logical. If |
verbose |
logical. If |
This function sets up Bayesian hierarchical GLMs and fits the model using the EM-IWLS algorithm. It is an alteration of the standard function glm
for classical GLMs, and includes all the glm
arguments and also some new arguments for Bayesian modeling.
For the priors Student
and De
, if autoscale=TRUE
, scale
is modified internally: for a predictor with only one value nothing is changed; for a predictor x with exactly two unique values, we divide the user-specified scale scale
by the range of x; for a predictor x with more than two unique values, we divide the user-specified scale by sd(x). For gaussian models, scale
is further multiplied by sd(y).
The argument group
is only used for the prior mde
or mt
, allowing for group-specific inclusion probabilities and thus incorporating group information (e.g. biological pathways). With the mixture prior, the predictors can be grouped or ungrouped. For ungrouped predictors, the prior is double-exponential or Student-t with scale s1
.
The prior mde(mean, s0, s1, b)
represents a mixture of De(mean, s0)
and De(mean, s1)
with group-specific inclusion probabilities following beta(1,b). The tuning parameter b
can be a vector of group-specific values.
This function returns an object of class "glm", including all outputs from the function glm
, and also results for the additional parameters in the hierarchical models.
Nengjun Yi, nyi@uab.edu
Yi, N. and Banerjee, S. (2009). Hierarchical generalized linear models for multiple quantitative trait locus mapping. Genetics 181, 1101-1113.
Yi, N., Kaklamani, V. G. and Pasche, B. (2011). Bayesian analysis of genetic interactions in case-control studies, with application to adiponectin genes and colorectal cancer risk. Ann Hum Genet 75, 90-104.
Yi, N. and Ma, S. (2012). Hierarchical Shrinkage Priors and Model Fitting Algorithms for High-dimensional Generalized Linear Models. Statistical Applications in Genetics and Molecular Biology 11 (6), 1544-6115.
Gelman, A., Jakulin, A., Pittau, M. G. and Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. Annals of Applied Statistics 2, 1360-1383.
Rockova, V. and George, E. I. (2014) EMVS: The EM Approach to Bayesian Variable Selection. JASA 109: 828-846.
Zaixiang Tang, Yueping Shen, Xinyan Zhang, Nengjun Yi (2017) The Spike-and-Slab Lasso Generalized Linear Models for Prediction and Associated Genes Detection. Genetics 205, 77–88.
Zaixiang Tang, Yueping Shen, Yan Li, Xinyan Zhang, Jia Wen, Chen'ao Qian, Wenzhuo Zhuang, Xinghua Shi, and Nengjun Yi (2018) Group Spike-and-Slab Lasso Generalized Linear Models for Disease Prediction and Associated Genes Detection by Incorporating Pathway Information. Bioinformatics 34(6): 901-910.
glm
, glm.nb
library(BhGLM)
N = 1000
K = 100
x = sim.x(n=N, m=K, corr=0.6) # simulate correlated continuous variables
h = rep(0.1, 4) # assign four non-zero main effects to have the assumed heritabilty
nz = as.integer(seq(5, K, by=K/length(h))); nz
yy = sim.y(x=x[, nz], mu=0, herit=h, p.neg=0.5, sigma=1.6, theta=2) # simulate responses
yy$coefs
# y = yy$y.normal; fam = gaussian; y = scale(y)
# y = yy$y.ordinal; fam = binomial
y = yy$y.nb; fam = NegBin
# jointly fit all variables (can be slow if m is large)
par(mfrow = c(2, 2), cex.axis = 1, mar = c(3, 4, 4, 4))
gap = 10
ps = 0.05
f1 = bglm(y ~ ., data = x, family = fam, prior = De(0, ps))
plot.bh(f1, vars.rm = 1, threshold = 0.01, gap = gap, main = "de")
f2 = bglm(y ~ ., data = x, family = fam, prior = Student(0, ps/1.4))
plot.bh(f2, vars.rm = 1, threshold = 0.01, gap = gap, main = "t")
ss = c(0.04, 0.5)
f3 = bglm(y ~ ., data = x, family = fam, prior = mde(0, ss[1], ss[2]))
plot.bh(f3, vars.rm = 1, threshold = 0.01, gap = gap, main = "mde")
# group-wise update (can be much faster if m is large)
ps = 0.05
f1 = bglm(y ~ ., data = x, family = fam, prior = De(0, ps), method.coef = 50) # update 50 coefficients at a time
plot.bh(f1, vars.rm = 1, threshold = 0.01, gap = gap)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.