glmb: Fitting Bayesian Generalized Linear Models

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

glmb is used to fit Bayesian generalized linear models, specified by giving a symbolic descriptions of the linear predictor, the error distribution, and the prior distribution.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
glmb(
  formula,
  family = binomial,
  pfamily = dNormal(mu, Sigma, dispersion = 1),
  n = 1000,
  data,
  weights,
  subset,
  offset,
  na.action,
  Gridtype = 1,
  start = NULL,
  etastart,
  mustart,
  control = list(...),
  model = TRUE,
  method = "glm.fit",
  x = FALSE,
  y = TRUE,
  contrasts = NULL,
  ...
)

## S3 method for class 'glmb'
print(x, digits = max(3, getOption("digits") - 3), ...)

Arguments

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under ‘Details’.

family

a description of the error distribution and link function to be used in the model. For glm this can be a character string naming a family function, a family function or the result of a call to a family function. For glm.fit only the third option is supported. (See family for details of family functions.)

pfamily

a description of the prior distribution and associated constants to be used in the model. This should be a pfamily function (see pfamily for details of pfamily functions).

n

number of draws to generate. If length(n) > 1, the length is taken to be the number required.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which glm is called.

weights

an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL or a numeric vector.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See documentation for model.offset at model.extract.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ‘factory-fresh’ default is stats{na.omit}. Another possible value is NULL, no action. Value stats{na.exclude} can be useful.

Gridtype

an optional argument specifying the method used to determine the number of tangent points used to construct the enveloping function.

start

starting values for the parameters in the linear predictor.

etastart

starting values for the linear predictor.

mustart

starting values for the vector of means.

control

a list of parameters for controlling the fitting process. For glm.fit this is passed to glm.control.

model

a logical value indicating whether model frame should be included as a component of the returned value.

method

the method to be used in fitting the model. The default method "glm.fit" uses iteratively reweighted least squares (IWLS): the alternative "model.frame" returns the model frame and does no fitting.

User-supplied fitting functions can be supplied either as a function or a character string naming a function, with a function which takes the same arguments as glm.fit. If specified as a character string it is looked up from within the stats namespace.

x

For glm: logical values indicating whether the response vector and model matrix used in the fitting process should be returned as components of the returned value.

For glm.fit: x is a design matrix of dimension n * p, and y is a vector of observations of length n.

y

For glm: logical values indicating whether the response vector and model matrix used in the fitting process should be returned as components of the returned value.

For glm.fit: x is a design matrix of dimension n * p, and y is a vector of observations of length n.

contrasts

an optional list. See the contrasts.arg of model.matrix.default.

...

For glm: arguments to be used to form the default control argument if it is not supplied directly.

For weights: further arguments passed to or from other methods.

digits

the number of significant digits to use when printing.

Details

The function glmb is a Bayesian version of the classical glm function. Setup of the models mirrors those for the glm function but add additional required arguments mu and Sigma, representing a multivariate normal prior. In addition, the dispersion parameter must currently be provided for the gaussian, Gamma, quasipoisson, and quasibinomial families (future implementations may incoporate priors for these into the glmb function). The function generates random iid samples from the posterior density associated with the model (assuming a fixed dispersion parameter).

Current implemented models include the gaussian family (identity link function), the poisson and quasipoisson families (log link function), the gamma family (log link function), as well as the binomial and quasibinomial families (logit, probit, and cloglog link functions).

The function returns the output from a call to the function glm as well as the simulated Bayesian coefficients and associated outputs. In addition, the function returns model diagnostic information related to the DIC, a Bayesian Information Criterion similar to the AIC for classical models.

For the gaussian family, iid samples from the posterior density are genererated using standard simulation procedures for multivariate normal densities. For all other families, the samples are generated using accept-reject procedures by leveraging the likelihood subgradient approach of Nygren and Nygren (2006). This approach relies on tight enveloping functions that bound the posterior density from above. The Gridtype parameter is used to select the method used for determining the size of a Grid used to build the enveloping function. See EnvelopeBuild for details. Depending on the selection, the time to build the envelope and the acceptance rate during the simulation process may vary. The returned value iters contains the number of candidates generated before acceptance for each draw.

Value

glmb returns an object of class "glmb". The function summary (i.e., summary.glmb) can be used to obtain or print a summary of the results. The generic accessor functions coefficients, fitted.values, residuals, and extractAIC can be used to extract various useful features of the value returned by codeglmb.

An object of class "glmb" is a list containing at least the following components:

glm

an object of class "glm" containing the output from a call to the function glm

coefficients

a matrix of dimension n by length(mu) with one sample in each row

coef.means

a vector of length(mu) with the estimated posterior mean coefficients

coef.mode

a vector of length(mu) with the estimated posterior mode coefficients

dispersion

Either a constant provided as part of the call, or a vector of length n with one sample in each row.

Prior

A list with the priors specified for the model in question. Items in list may vary based on the type of prior

fitted.values

a matrix of dimension n by length(y) with one sample for the mean fitted values in each row

family

the family object used.

linear.predictors

an n by length(y) matrix with one sample for the linear fit on the link scale in each row

deviance

an n by 1 matrix with one sample for the deviance in each row

pD

An Estimate for the effective number of parameters

Dbar

Expected value for minus twice the log-likelihood function

Dthetabar

Value of minus twice the log-likelihood function evaluated at the mean value for the coefficients

DIC

Estimated Deviance Information criterion

prior.weights

a vector of weights specified or implied by the model

y

a vector with the dependent variable

x

a matrix with the implied design matrix for the model

model

if requested (the default),the model frame

call

the matched call

formula

the formula supplie

terms

the terms object used

data

the data argument

famfunc

Family functions used during estimation process

iters

an n by 1 matrix giving the number of candidates generated before acceptance for each sample.

contrasts

(where relevant) the contrasts used.

xlevels

(where relevant) a record of the levels of the factors used in fitting

digits

the number of significant digits to use when printing.

In addition, non-empty fits will have (yet to be implemented) components qr, R and effects relating to the final weighted linear fit for the posterior mode. Objects of class "glmb" are normall of class c("glmb","glm","lm"), that is inherit from classes glm and lm and well-designed methods from those classed will be applied when appropriate.

If a binomial glmb model was specified by giving a two-column response, the weights returned by prior.weights are the total number of cases (factored by the supplied case weights) and the component of y of the result is the proportion of successes.

Author(s)

The R implementation of glmb has been written by Kjell Nygren and was built to be a Bayesian version of the glm function and hence tries to mirror the features of the glm function to the greatest extent possible. For details on the author(s) for the glm function see the documentation for glm.

References

Dobson, A. J. (1990) An Introduction to Generalized Linear Models. London: Chapman and Hall.

Hastie, T. J. and Pregibon, D. (1992) Generalized linear models. Chapter 6 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole. McCullagh P. and Nelder, J. A. (1989) Generalized Linear Models. London: Chapman and Hall.

Nygren, K.N. and Nygren, L.M (2006) Likelihood Subgradient Densities. Journal of the American Statistical Association. vol.101, no.475, pp 1144-1156. doi: 10.1198/016214506000000357.

Raiffa, Howard and Schlaifer, R (1961) Applied Statistical Decision Theory. Boston: Clinton Press, Inc.

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. New York: Springer.

See Also

lm and glm for classical modeling functions.

pfamily for documentation of pfamily functions used to specify priors.

Prior_Setup, Prior_Check for functions used to initialize and to check priors,

summary.glmb, predict.glmb, residuals.glmb, simulate.glmb, extractAIC.glmb, dummy.coef.glmb and methods(class="glmb") for glmb and the methods and generic functions for classes glm and lm from which class glmb inherits.

Other glmbayes modeling functions: lmb(), rglmb(), rlmb()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
set.seed(333)
## Dobson (1990) Page 93: Randomized Controlled Trial :
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
mysd<-1
mu<-matrix(0,5)
mu[1]=log(mean(counts))
V0<-((mysd)^2)*diag(5)
glmb.D93<-glmb(counts ~ outcome + treatment, family = poisson(),pfamily=dNormal(mu=mu,Sigma=V0))
summary(glmb.D93)

# Menarche Binomial Data Example 
data(menarche2)
Age2=menarche2$Age-13

#Priors Are Derived in Vignette
nvars=2
## Logit Prior and Model
mu1=as.matrix(c(0,1.098612),ncol=1)
V1<-1*diag(nvars)
V1[1,1]=0.18687882
V1[2,2]=0.10576217
V1[1,2]=-0.03389182
V1[2,1]=-0.03389182
glmb.out1<-glmb(cbind(Menarche, Total-Menarche) ~ Age2,
family=binomial(logit),pfamily=dNormal(mu=mu1,Sigma=V1), data=menarche2)
summary(glmb.out1)

## Probit Prior and Model
mu2=as.matrix(c(0,0.6407758),ncol=1)
V2<-1*diag(nvars)
V2[1,1]=0.07158369
V2[2,2]=0.03453205
V2[1,2]=-0.01512075
V2[2,1]=-0.01512075
glmb.out2<-glmb(cbind(Menarche, Total-Menarche) ~ Age2,
family=binomial(probit),pfamily=dNormal(mu=mu2,Sigma=V2), data=menarche2)
summary(glmb.out2)

## clog-log Prior and Model
mu2=as.matrix(c(-0.3665129,0.6002727),ncol=1)
V2<-1*diag(nvars)
V2[1,1]=0.11491322
V2[2,2]=0.03365986
V2[1,2]=-0.03502783
V2[2,1]=-0.03502783
glmb.out3<-glmb(cbind(Menarche, Total-Menarche) ~ Age2,
                family=binomial(cloglog),pfamily=dNormal(mu=mu2,Sigma=V2), data=menarche2)
summary(glmb.out3)

DIC_Out=rbind(extractAIC(glmb.out1),extractAIC(glmb.out2),extractAIC(glmb.out3))
rownames(DIC_Out)=c("logit","probit","clog-log")
colnames(DIC_Out)=c("pD","DIC")
DIC_Out

knygren/glmbayes documentation built on Sept. 4, 2020, 4:39 p.m.