The BGLR (‘Bayesian Generalized Linear Regression’) function fits various types of parametric and semi-parametric Bayesian regressions to continuos (censored or not), binary and ordinal outcomes.
1 2 3 4 |
y |
(numeric, n) the data-vector (NAs allowed). |
response_type |
(string) admits values |
a,b |
(numeric, n) only requiered for censored outcomes, |
ETA |
(list) This is a two-level list used to specify the regression function (or linear predictor). By default the linear predictor (the conditional expectation function in case of Gaussian outcomes) includes only an intercept. Regression on covariates and other types of random effects are specified in this two-level list. For instance: ETA=list(list(X=W, model="FIXED"), list(X=Z,model="BL"), list(K=G,model="RKHS")), specifies that the linear predictor should include: an intercept (included by default) plus a linear regression on W with regression coefficients treated as fixed effects (i.e., flat prior), plus regression on Z, with regression coefficients modeled as in the Bayesian Lasso of Park and Casella (2008) plus and a random effect with co-variance structure G. For linear regressions the following options are implemented: FIXED (Flat prior), BRR (Gaussian prior), BayesA (scaled-t prior), BL (Double-Exponential prior),
BayesB (two component mixture prior with a point of mass at zero and a sclaed-t slab), BayesC (two component mixture prior with a point of
mass at zero and a Gaussian slab). In linear regressions X can be the incidence matrix for effects or a formula (e.g. |
weights |
(numeric, n) a vector of weights, may be |
nIter,burnIn, thin |
(integer) the number of iterations, burn-in and thinning. |
saveAt |
(string) this may include a path and a pre-fix that will be added to the name of the files that are saved as the program runs. |
S0, df0 |
(numeric) The scale parameter for the scaled inverse-chi squared prior assigned to the residual variance, only used with Gaussian outcomes.
In the parameterization of the scaled-inverse chi square in BGLR the expected values is |
R2 |
(numeric, |
verbose |
(logical) if TRUE the iteration history is printed, default TRUE. |
rmExistingFiles |
(logical) if TRUE removes existing output files from previous runs, default TRUE. |
groups |
(factor) a vector of the same length of y that associates observations with groups, each group will have an associated variance component for the error term. |
BGLR implements a Gibbs sampler for a Bayesian regresion model. The linear predictor (or regression function) includes an intercept (introduced by default) plus a number of user-specified regression components (X) and random effects (u), that is:
The components of the linear predictor are specified in the argument ETA (see above). The user can specify as many linear terms as desired, and for each component the user can choose the prior density to be assigned. The distribution of the response is modeled as a function of the linear predictor.
For Gaussian outcomes, the linear predictor is the conditional expectation, and censoring is allowed. For censored data points the actual response value (y_i) is missing, and the entries of the vectors a and b (see above) give the lower an upper vound for y_i. The following table shows the configuration of the triplet (y, a, b) for un-censored, right-censored, left-censored and interval censored.
a | y | b | |
Un-censored | NULL | y_i | NULL |
Right censored | a_i | NA | ∞ |
Left censored | -∞ | NA | b_i |
Interval censored | a_i | NA | b_i |
Internally, censoring is dealt with as a missing data problem.
Ordinal outcomes are modelled using the probit link, implemented via data augmentation. In this case the linear predictor becomes the mean of the underlying liability variable which is normal with mean equal to the linear predictor and variance equal to one. In case of only two classes (binary outcome) the threshold is set equal to zero, for more than two classess thresholds are estimated from the data. Further details about this approach can be found in Albert and Chib (1993).
A list with estimated posterior means, estimated posterior standard deviations, and the parameters used to fit the model. See the vignettes in the package (or http://genomics.cimmyt.org/BGLR-extdoc.pdf) for further details.
Gustavo de los Campos, Paulino Perez Rodriguez,
Albert J,. S. Chib. 1993. Bayesian Analysis of Binary and Polychotomus Response Data. JASA, 88: 669-679.
de los Campos G., H. Naya, D. Gianola, J. Crossa, A. Legarra, E. Manfredi, K. Weigel and J. Cotes. 2009. Predicting Quantitative Traits with Regression Models for Dense Molecular Markers and Pedigree. Genetics 182: 375-385.
de los Campos, G., D. Gianola, G. J. M., Rosa, K. A., Weigel, and J. Crossa. 2010. Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genetics Research, 92:295-308.
Park T. and G. Casella. 2008. The Bayesian LASSO. Journal of the American Statistical Association 103: 681-686.
Spiegelhalter, D.J., N.G. Best, B.P. Carlin and A. van der Linde. 2002. Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B (Statistical Methodology) 64 (4): 583-639.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
All documentation is copyright its authors; we didn't write any of that.