Bayesian generalized linear models.
Description
Bayesian functions for generalized linear modeling with independent normal, t, or Cauchy prior distribution for the coefficients.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32  bayesglm (formula, family = gaussian, data,
weights, subset, na.action,
start = NULL, etastart, mustart,
offset, control = list(...),
model = TRUE, method = "glm.fit",
x = FALSE, y = TRUE, contrasts = NULL,
drop.unused.levels = TRUE,
prior.mean = 0,
prior.scale = NULL,
prior.df = 1,
prior.mean.for.intercept = 0,
prior.scale.for.intercept = NULL,
prior.df.for.intercept = 1,
min.prior.scale=1e12,
scaled = TRUE, keep.order=TRUE,
drop.baseline=TRUE,
maxit=100,
print.unnormalized.log.posterior=FALSE,
Warning=TRUE,...)
bayesglm.fit (x, y, weights = rep(1, nobs),
start = NULL, etastart = NULL,
mustart = NULL, offset = rep(0, nobs), family = gaussian(),
control = list(), intercept = TRUE,
prior.mean = 0,
prior.scale = NULL,
prior.df = 1,
prior.mean.for.intercept = 0,
prior.scale.for.intercept = NULL,
prior.df.for.intercept = 1,
min.prior.scale=1e12, scaled = TRUE,
print.unnormalized.log.posterior=FALSE, Warning=TRUE)

Arguments
formula 
a symbolic description of the model to be fit. The details of model specification are given below. 
family 
a description of the error distribution and link
function to be used in the model. This can be a character string
naming a family function, a family function or the result of a call
to a family function. (See 
data 
an optional data frame, list or environment (or object
coercible by 
weights 
an optional vector of weights to be used in the fitting
process. Should be 
subset 
an optional vector specifying a subset of observations to be used in the fitting process. 
na.action 
a function which indicates what should happen
when the data contain 
start 
starting values for the parameters in the linear predictor. 
etastart 
starting values for the linear predictor. 
mustart 
starting values for the vector of means. 
offset 
this can be used to specify an a priori
known component to be included in the linear predictor
during fitting. This should be 
control 
a list of parameters for controlling the fitting
process. See the documentation for 
model 
a logical value indicating whether model frame should be included as a component of the returned value. 
method 
the method to be used in fitting the model.
The default method 
x, y 
For For 
contrasts 
an optional list. See the 
drop.unused.levels 
default TRUE, if FALSE, it interpolates the intermediate values if the data have integer levels. 
intercept 
logical. Should an intercept be included in the null model? 
prior.mean 
prior mean for the coefficients: default is 0. Can be a vector of length equal to the number of predictors (not counting the intercept, if any). If it is a scalar, it is expanded to the length of this vector. 
prior.scale 
prior scale for the coefficients: default is NULL; if is NULL, for a logit model, prior.scale is 2.5; for a probit model, prior scale is 2.5*1.6. Can be a vector of length equal to the number of predictors (not counting the intercept, if any). If it is a scalar, it is expanded to the length of this vector. 
prior.df 
prior degrees of freedom for the coefficients. For t distribution: default is 1 (Cauchy). Set to Inf to get normal prior distributions. Can be a vector of length equal to the number of predictors (not counting the intercept, if any). If it is a scalar, it is expanded to the length of this vector. 
prior.mean.for.intercept 
prior mean for the intercept: default is 0. See ‘Details’. 
prior.scale.for.intercept 
prior scale for the intercept: default is NULL; for a logit model, prior scale for intercept is 10; for probit model, prior scale for intercept is rescaled as 10*1.6. 
prior.df.for.intercept 
prior degrees of freedom for the intercept: default is 1. 
min.prior.scale 
Minimum prior scale for the coefficients: default is 1e12. 
scaled 
scaled=TRUE, the scales for the prior distributions of the coefficients are determined as follows: For a predictor with only one value, we just use prior.scale. For a predictor with two values, we use prior.scale/range(x). For a predictor with more than two values, we use prior.scale/(2*sd(x)). If the response is Gaussian, prior.scale is also multiplied by 2 * sd(y). Default is TRUE 
keep.order 
a logical value indicating whether the terms should
keep their positions. If 
drop.baseline 
Drop the base level of categorical x's, default is TRUE. 
maxit 
integer giving the maximal number of IWLS iterations, default is 100. This can also be controlled by 
print.unnormalized.log.posterior 
display the unnormalized log posterior likelihood for bayesglm, default=FALSE 
Warning 
default is TRUE, which will show the error messages of not convergence and separation. 
... 
further arguments passed to or from other methods. 
Details
The program is a simple alteration of glm()
that uses an approximate EM
algorithm to update the betas at each step using an augmented regression
to represent the prior information.
We use Studentt prior distributions for the coefficients. The prior distribution for the constant term is set so it applies to the value when all predictors are set to their mean values.
If scaled=TRUE, the scales for the prior distributions of the coefficients are determined as follows: For a predictor with only one value, we just use prior.scale. For a predictor with two values, we use prior.scale/range(x). For a predictor with more than two values, we use prior.scale/(2*sd(x)).
We include all the glm()
arguments but we haven't tested that all the
options (e.g., offsets
, contrasts
,
deviance
for the null model) all work.
The new arguments here are: prior.mean
, prior.scale
,
prior.scale.for.intercept
, prior.df
, prior.df.for.intercept
and
scaled
.
Value
See glm
for details.
prior.mean 
prior means for the coefficients and the intercept. 
prior.scale 
prior scales for the coefficients 
prior.df 
prior dfs for the coefficients. 
prior.scale.for.intercept 
prior scale for the intercept 
prior.df.for.intercept 
prior df for the intercept 
Author(s)
Andrew Gelman gelman@stat.columbia.edu; YuSung Su suyusung@tsinghua.edu.cn; Daniel Lee bearlee@alum.mit.edu; Aleks Jakulin Jakulin@stat.columbia.edu
References
Andrew Gelman, Aleks Jakulin, Maria Grazia Pittau and YuSung Su. (2009). “A Weakly Informative Default Prior Distribution For Logistic And Other Regression Models.” The Annals of Applied Statistics 2 (4): 1360–1383. http://www.stat.columbia.edu/~gelman/research/published/priors11.pdf
See Also
glm
,
bayespolr
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97  n < 100
x1 < rnorm (n)
x2 < rbinom (n, 1, .5)
b0 < 1
b1 < 1.5
b2 < 2
y < rbinom (n, 1, invlogit(b0+b1*x1+b2*x2))
M1 < glm (y ~ x1 + x2, family=binomial(link="logit"))
display (M1)
M2 < bayesglm (y ~ x1 + x2, family=binomial(link="logit"),
prior.scale=Inf, prior.df=Inf)
display (M2) # just a test: this should be identical to classical logit
M3 < bayesglm (y ~ x1 + x2, family=binomial(link="logit"))
# default Cauchy prior with scale 2.5
display (M3)
M4 < bayesglm (y ~ x1 + x2, family=binomial(link="logit"),
prior.scale=2.5, prior.df=1)
# Same as M3, explicitly specifying Cauchy prior with scale 2.5
display (M4)
M5 < bayesglm (y ~ x1 + x2, family=binomial(link="logit"),
prior.scale=2.5, prior.df=7) # t_7 prior with scale 2.5
display (M5)
M6 < bayesglm (y ~ x1 + x2, family=binomial(link="logit"),
prior.scale=2.5, prior.df=Inf) # normal prior with scale 2.5
display (M6)
# Create separation: set y=1 whenever x2=1
# Now it should blow up without the prior!
y < ifelse (x2==1, 1, y)
M1 < glm (y ~ x1 + x2, family=binomial(link="logit"))
display (M1)
M2 < bayesglm (y ~ x1 + x2, family=binomial(link="logit"),
prior.scale=Inf, prior.scale.for.intercept=Inf) # Same as M1
display (M2)
M3 < bayesglm (y ~ x1 + x2, family=binomial(link="logit"))
display (M3)
M4 < bayesglm (y ~ x1 + x2, family=binomial(link="logit"),
prior.scale=2.5, prior.scale.for.intercept=10) # Same as M3
display (M4)
M5 < bayesglm (y ~ x1 + x2, family=binomial(link="logit"),
prior.scale=2.5, prior.df=7)
display (M5)
M6 < bayesglm (y ~ x1 + x2, family=binomial(link="logit"),
prior.scale=2.5, prior.df=Inf)
display (M6)
# bayesglm with gaussian family (bayes lm)
sigma < 5
y2 < rnorm (n, b0+b1*x1+b2*x2, sigma)
M7 < bayesglm (y2 ~ x1 + x2, prior.scale=Inf, prior.df=Inf)
display (M7)
# bayesglm with categorical variables
z1 < trunc(runif(n, 4, 9))
levels(factor(z1))
z2 < trunc(runif(n, 15, 19))
levels(factor(z2))
## drop the base level (R default)
M8 < bayesglm (y ~ x1 + factor(z1) + factor(z2),
family=binomial(link="logit"), prior.scale=2.5, prior.df=Inf)
display (M8)
## keep all levels with the intercept, keep the variable order
M9 < bayesglm (y ~ x1 + x1:x2 + factor(z1) + x2 + factor(z2),
family=binomial(link="logit"),
prior.mean=rep(0,12),
prior.scale=rep(2.5,12),
prior.df=rep(Inf,12),
prior.mean.for.intercept=0,
prior.scale.for.intercept=10,
prior.df.for.intercept=1,
drop.baseline=FALSE, keep.order=TRUE)
display (M9)
## keep all levels without the intercept
M10 < bayesglm (y ~ x1 + factor(z1) + x1:x2 + factor(z2)1,
family=binomial(link="logit"),
prior.mean=rep(0,11),
prior.scale=rep(2.5,11),
prior.df=rep(Inf,11),
drop.baseline=FALSE)
display (M10)
