Bayesian generalized linear models.

Share:

Description

Bayesian functions for generalized linear modeling with independent normal, t, or Cauchy prior distribution for the coefficients.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
bayesglm (formula, family = gaussian, data,
    weights, subset, na.action,
    start = NULL, etastart, mustart,
    offset, control = list(...),
    model = TRUE, method = "glm.fit",
    x = FALSE, y = TRUE, contrasts = NULL,
    drop.unused.levels = TRUE,
    prior.mean = 0,
    prior.scale = NULL,
    prior.df = 1,
    prior.mean.for.intercept = 0,
    prior.scale.for.intercept = NULL,
    prior.df.for.intercept = 1,
    min.prior.scale=1e-12,
    scaled = TRUE, keep.order=TRUE,
    drop.baseline=TRUE,
    maxit=100, 
    print.unnormalized.log.posterior=FALSE,
    Warning=TRUE,...)

bayesglm.fit (x, y, weights = rep(1, nobs),
    start = NULL, etastart = NULL,
    mustart = NULL, offset = rep(0, nobs), family = gaussian(),
    control = list(), intercept = TRUE,
    prior.mean = 0,
    prior.scale = NULL,
    prior.df = 1,
    prior.mean.for.intercept = 0,
    prior.scale.for.intercept = NULL,
    prior.df.for.intercept = 1,
    min.prior.scale=1e-12, scaled = TRUE,
    print.unnormalized.log.posterior=FALSE, Warning=TRUE)

Arguments

formula

a symbolic description of the model to be fit. The details of model specification are given below.

family

a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See family for details of family functions.)

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which glm is called.

weights

an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The “factory-fresh” default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.

start

starting values for the parameters in the linear predictor.

etastart

starting values for the linear predictor.

mustart

starting values for the vector of means.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length either one or equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if both are specified their sum is used. See model.offset.

control

a list of parameters for controlling the fitting process. See the documentation for glm.control for details.

model

a logical value indicating whether model frame should be included as a component of the returned value.

method

the method to be used in fitting the model. The default method "glm.fit" uses iteratively reweighted least squares (IWLS). The only current alternative is "model.frame" which returns the model frame and does no fitting.

x, y

For glm: logical values indicating whether the response vector and model matrix used in the fitting process should be returned as components of the returned value.

For glm.fit: x is a design matrix of dimension n * p, and y is a vector of observations of length n.

contrasts

an optional list. See the contrasts.arg of model.matrix.default.

drop.unused.levels

default TRUE, if FALSE, it interpolates the intermediate values if the data have integer levels.

intercept

logical. Should an intercept be included in the null model?

prior.mean

prior mean for the coefficients: default is 0. Can be a vector of length equal to the number of predictors (not counting the intercept, if any). If it is a scalar, it is expanded to the length of this vector.

prior.scale

prior scale for the coefficients: default is NULL; if is NULL, for a logit model, prior.scale is 2.5; for a probit model, prior scale is 2.5*1.6. Can be a vector of length equal to the number of predictors (not counting the intercept, if any). If it is a scalar, it is expanded to the length of this vector.

prior.df

prior degrees of freedom for the coefficients. For t distribution: default is 1 (Cauchy). Set to Inf to get normal prior distributions. Can be a vector of length equal to the number of predictors (not counting the intercept, if any). If it is a scalar, it is expanded to the length of this vector.

prior.mean.for.intercept

prior mean for the intercept: default is 0. See ‘Details’.

prior.scale.for.intercept

prior scale for the intercept: default is NULL; for a logit model, prior scale for intercept is 10; for probit model, prior scale for intercept is rescaled as 10*1.6.

prior.df.for.intercept

prior degrees of freedom for the intercept: default is 1.

min.prior.scale

Minimum prior scale for the coefficients: default is 1e-12.

scaled

scaled=TRUE, the scales for the prior distributions of the coefficients are determined as follows: For a predictor with only one value, we just use prior.scale. For a predictor with two values, we use prior.scale/range(x). For a predictor with more than two values, we use prior.scale/(2*sd(x)). If the response is Gaussian, prior.scale is also multiplied by 2 * sd(y). Default is TRUE

keep.order

a logical value indicating whether the terms should keep their positions. If FALSE the terms are reordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on. Effects of a given order are kept in the order specified. Default is TRUE.

drop.baseline

Drop the base level of categorical x's, default is TRUE.

maxit

integer giving the maximal number of IWLS iterations, default is 100. This can also be controlled by control.

print.unnormalized.log.posterior

display the unnormalized log posterior likelihood for bayesglm, default=FALSE

Warning

default is TRUE, which will show the error messages of not convergence and separation.

...

further arguments passed to or from other methods.

Details

The program is a simple alteration of glm() that uses an approximate EM algorithm to update the betas at each step using an augmented regression to represent the prior information.

We use Student-t prior distributions for the coefficients. The prior distribution for the constant term is set so it applies to the value when all predictors are set to their mean values.

If scaled=TRUE, the scales for the prior distributions of the coefficients are determined as follows: For a predictor with only one value, we just use prior.scale. For a predictor with two values, we use prior.scale/range(x). For a predictor with more than two values, we use prior.scale/(2*sd(x)).

We include all the glm() arguments but we haven't tested that all the options (e.g., offsets, contrasts, deviance for the null model) all work.

The new arguments here are: prior.mean, prior.scale, prior.scale.for.intercept, prior.df, prior.df.for.interceptand scaled.

Value

See glm for details.

prior.mean

prior means for the coefficients and the intercept.

prior.scale

prior scales for the coefficients

prior.df

prior dfs for the coefficients.

prior.scale.for.intercept

prior scale for the intercept

prior.df.for.intercept

prior df for the intercept

Author(s)

Andrew Gelman gelman@stat.columbia.edu; Yu-Sung Su suyusung@tsinghua.edu.cn; Daniel Lee bearlee@alum.mit.edu; Aleks Jakulin Jakulin@stat.columbia.edu

References

Andrew Gelman, Aleks Jakulin, Maria Grazia Pittau and Yu-Sung Su. (2009). “A Weakly Informative Default Prior Distribution For Logistic And Other Regression Models.” The Annals of Applied Statistics 2 (4): 1360–1383. http://www.stat.columbia.edu/~gelman/research/published/priors11.pdf

See Also

glm, bayespolr

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
  n <- 100
  x1 <- rnorm (n)
  x2 <- rbinom (n, 1, .5)
  b0 <- 1
  b1 <- 1.5
  b2 <- 2
  y <- rbinom (n, 1, invlogit(b0+b1*x1+b2*x2))

  M1 <- glm (y ~ x1 + x2, family=binomial(link="logit"))
  display (M1)

  M2 <- bayesglm (y ~ x1 + x2, family=binomial(link="logit"),
    prior.scale=Inf, prior.df=Inf)
  display (M2)  # just a test:  this should be identical to classical logit

  M3 <- bayesglm (y ~ x1 + x2, family=binomial(link="logit"))
    # default Cauchy prior with scale 2.5
  display (M3)

  M4 <- bayesglm (y ~ x1 + x2, family=binomial(link="logit"),
    prior.scale=2.5, prior.df=1)
    # Same as M3, explicitly specifying Cauchy prior with scale 2.5
  display (M4)

  M5 <- bayesglm (y ~ x1 + x2, family=binomial(link="logit"),
    prior.scale=2.5, prior.df=7)   # t_7 prior with scale 2.5
  display (M5)

  M6 <- bayesglm (y ~ x1 + x2, family=binomial(link="logit"),
    prior.scale=2.5, prior.df=Inf)  # normal prior with scale 2.5
  display (M6)

# Create separation:  set y=1 whenever x2=1
# Now it should blow up without the prior!

  y <- ifelse (x2==1, 1, y)

  M1 <- glm (y ~ x1 + x2, family=binomial(link="logit"))
  display (M1)

  M2 <- bayesglm (y ~ x1 + x2, family=binomial(link="logit"),
    prior.scale=Inf, prior.scale.for.intercept=Inf) # Same as M1
  display (M2)

  M3 <- bayesglm (y ~ x1 + x2, family=binomial(link="logit"))
  display (M3)

  M4 <- bayesglm (y ~ x1 + x2, family=binomial(link="logit"),
    prior.scale=2.5, prior.scale.for.intercept=10)  # Same as M3
  display (M4)

  M5 <- bayesglm (y ~ x1 + x2, family=binomial(link="logit"),
    prior.scale=2.5, prior.df=7)
  display (M5)

  M6 <- bayesglm (y ~ x1 + x2, family=binomial(link="logit"),
    prior.scale=2.5, prior.df=Inf)
  display (M6)

  # bayesglm with gaussian family (bayes lm)
  sigma <- 5
  y2 <- rnorm (n, b0+b1*x1+b2*x2, sigma)
  M7 <- bayesglm (y2 ~ x1 + x2, prior.scale=Inf, prior.df=Inf)
  display (M7)


  # bayesglm with categorical variables
  z1 <- trunc(runif(n, 4, 9))
  levels(factor(z1))
  z2 <- trunc(runif(n, 15, 19))
  levels(factor(z2))

  ## drop the base level (R default)
  M8 <- bayesglm (y ~ x1 + factor(z1) + factor(z2),
    family=binomial(link="logit"), prior.scale=2.5, prior.df=Inf)
  display (M8)

  ## keep all levels with the intercept, keep the variable order
  M9 <- bayesglm (y ~ x1 + x1:x2 + factor(z1) + x2 + factor(z2),
    family=binomial(link="logit"),
    prior.mean=rep(0,12),
    prior.scale=rep(2.5,12),
    prior.df=rep(Inf,12),
    prior.mean.for.intercept=0,
    prior.scale.for.intercept=10,
    prior.df.for.intercept=1,
    drop.baseline=FALSE, keep.order=TRUE)
  display (M9)

  ## keep all levels without the intercept
  M10 <- bayesglm (y ~ x1 + factor(z1) + x1:x2 + factor(z2)-1,
    family=binomial(link="logit"),
    prior.mean=rep(0,11),
    prior.scale=rep(2.5,11),
    prior.df=rep(Inf,11),
    drop.baseline=FALSE)
  display (M10)