Bayesian Generalized Linear Regression

Description

The BGLR (‘Bayesian Generalized Linear Regression’) function fits various types of parametric and semi-parametric Bayesian regressions to continuos (censored or not), binary and ordinal outcomes.

Usage

1
2
3
4
  BGLR(y, response_type = "gaussian", a=NULL, b=NULL,ETA = NULL, nIter = 1500,
       burnIn = 500, thin = 5, saveAt = "", S0 = NULL, 
       df0 =5, R2 = 0.5, weights = NULL,
       verbose = TRUE, rmExistingFiles = TRUE, groups=NULL)

Arguments

y

(numeric, n) the data-vector (NAs allowed).

response_type

(string) admits values "gaussian" or "ordinal". The Gaussian outcome may be censored or not (see below). If response_type="gaussian", y should be coercible to numeric. If response_type="ordinal", y should be coercible to character, and the order of the outcomes is determined based on the alphanumeric order (0<1<2..<a<b..). For ordinal traits the probit link is used.

a,b

(numeric, n) only requiered for censored outcomes, a and b are vectors specifying lower and upper bounds for censored observations, respectively. The default value, for non-censored and ordinal outcomes, is NULL (see details).

ETA

(list) This is a two-level list used to specify the regression function (or linear predictor). By default the linear predictor (the conditional expectation function in case of Gaussian outcomes) includes only an intercept. Regression on covariates and other types of random effects are specified in this two-level list. For instance:

ETA=list(list(X=W, model="FIXED"), list(X=Z,model="BL"), list(K=G,model="RKHS")),

specifies that the linear predictor should include: an intercept (included by default) plus a linear regression on W with regression coefficients treated as fixed effects (i.e., flat prior), plus regression on Z, with regression coefficients modeled as in the Bayesian Lasso of Park and Casella (2008) plus and a random effect with co-variance structure G.

For linear regressions the following options are implemented: FIXED (Flat prior), BRR (Gaussian prior), BayesA (scaled-t prior), BL (Double-Exponential prior), BayesB (two component mixture prior with a point of mass at zero and a sclaed-t slab), BayesC (two component mixture prior with a point of mass at zero and a Gaussian slab). In linear regressions X can be the incidence matrix for effects or a formula (e.g. ~factor(sex) + age) in which case the incidence matrix is created internally using the model.matrix function of R. For Gaussian processes (RKHS) a co-variance matrix (K) must be provided. Further details about the models implemented in BGLR see the vignettes in the package or http://genomics.cimmyt.org/BGLR-extdoc.pdf.

weights

(numeric, n) a vector of weights, may be NULL. If weights is not NULL, the residual variance of each data-point is set to be proportional to the inverse of the squared-weight. Only used with Gaussian outcomes.

nIter,burnIn, thin

(integer) the number of iterations, burn-in and thinning.

saveAt

(string) this may include a path and a pre-fix that will be added to the name of the files that are saved as the program runs.

S0, df0

(numeric) The scale parameter for the scaled inverse-chi squared prior assigned to the residual variance, only used with Gaussian outcomes. In the parameterization of the scaled-inverse chi square in BGLR the expected values is S0/(df0-2). The default value for the df parameter is 5. If the scale is not specified a value is calculated so that the prior mode of the residual variance equals var(y)*R2 (see below). For further details see the vignettes in the package or http://genomics.cimmyt.org/BGLR-extdoc.pdf.

R2

(numeric, 0<R2<1) The proportion of variance that one expects, a priori, to be explained by the regression. Only used if the hyper-parameters are not specified; if that is the case, internaly, hyper-paramters are set so that the prior modes are consistent with the variance partition specified by R2 and the prior distribution is relatively flat at the mode. For further details see the vignettes in the package or http://genomics.cimmyt.org/BGLR-extdoc.pdf.

verbose

(logical) if TRUE the iteration history is printed, default TRUE.

rmExistingFiles

(logical) if TRUE removes existing output files from previous runs, default TRUE.

groups

(factor) a vector of the same length of y that associates observations with groups, each group will have an associated variance component for the error term.

Details

BGLR implements a Gibbs sampler for a Bayesian regresion model. The linear predictor (or regression function) includes an intercept (introduced by default) plus a number of user-specified regression components (X) and random effects (u), that is:

ETA.png

The components of the linear predictor are specified in the argument ETA (see above). The user can specify as many linear terms as desired, and for each component the user can choose the prior density to be assigned. The distribution of the response is modeled as a function of the linear predictor.

For Gaussian outcomes, the linear predictor is the conditional expectation, and censoring is allowed. For censored data points the actual response value (y_i) is missing, and the entries of the vectors a and b (see above) give the lower an upper vound for y_i. The following table shows the configuration of the triplet (y, a, b) for un-censored, right-censored, left-censored and interval censored.

a y b
Un-censored NULL y_i NULL
Right censored a_i NA
Left censored -∞ NA b_i
Interval censored a_i NA b_i

Internally, censoring is dealt with as a missing data problem.

Ordinal outcomes are modelled using the probit link, implemented via data augmentation. In this case the linear predictor becomes the mean of the underlying liability variable which is normal with mean equal to the linear predictor and variance equal to one. In case of only two classes (binary outcome) the threshold is set equal to zero, for more than two classess thresholds are estimated from the data. Further details about this approach can be found in Albert and Chib (1993).

Value

A list with estimated posterior means, estimated posterior standard deviations, and the parameters used to fit the model. See the vignettes in the package (or http://genomics.cimmyt.org/BGLR-extdoc.pdf) for further details.

Author(s)

Gustavo de los Campos, Paulino Perez Rodriguez,

References

Albert J,. S. Chib. 1993. Bayesian Analysis of Binary and Polychotomus Response Data. JASA, 88: 669-679.

de los Campos G., H. Naya, D. Gianola, J. Crossa, A. Legarra, E. Manfredi, K. Weigel and J. Cotes. 2009. Predicting Quantitative Traits with Regression Models for Dense Molecular Markers and Pedigree. Genetics 182: 375-385.

de los Campos, G., D. Gianola, G. J. M., Rosa, K. A., Weigel, and J. Crossa. 2010. Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genetics Research, 92:295-308.

Park T. and G. Casella. 2008. The Bayesian LASSO. Journal of the American Statistical Association 103: 681-686.

Spiegelhalter, D.J., N.G. Best, B.P. Carlin and A. van der Linde. 2002. Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, Series B (Statistical Methodology) 64 (4): 583-639.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
## Not run: 
#Demos
library(BGLR)

#BayesA
demo(BA)

#BayesB
demo(BB)

#Bayesian LASSO
demo(BL)

#Bayesian Ridge Regression
demo(BRR)

#BayesCpi
demo(BayesCpi)

#RKHS
demo(RKHS)

#Binary traits
demo(Bernoulli)

#Ordinal traits
demo(ordinal)

#Censored traits
demo(censored)


## End(Not run)