bic.glm: Bayesian Model Averaging for generalized linear models.

bic.glmR Documentation

Bayesian Model Averaging for generalized linear models.

Description

Bayesian Model Averaging accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to approximate posterior model probability.

Usage

bic.glm(x, ...)

## S3 method for class 'matrix'
bic.glm(x, y, glm.family, wt = rep(1, nrow(x)),
    strict = FALSE, prior.param = c(rep(0.5, ncol(x))), OR = 20, 
    maxCol = 30, OR.fix = 2, nbest = 150, dispersion = NULL, 
    factor.type = TRUE, factor.prior.adjust = FALSE, 
    occam.window = TRUE, call = NULL, ...)

## S3 method for class 'data.frame'
bic.glm(x, y, glm.family, wt = rep(1, nrow(x)),
    strict = FALSE, prior.param = c(rep(0.5, ncol(x))), OR = 20, 
    maxCol = 30, OR.fix = 2, nbest = 150, dispersion = NULL, 
    factor.type = TRUE, factor.prior.adjust = FALSE, 
    occam.window = TRUE, call = NULL, ...)

## S3 method for class 'formula'
bic.glm(f, data, glm.family, wt = rep(1, nrow(data)),
    strict = FALSE, prior.param = c(rep(0.5, ncol(x))), OR = 20, 
    maxCol = 30, OR.fix = 2, nbest = 150, dispersion = NULL, 
    factor.type = TRUE, factor.prior.adjust = FALSE, 
    occam.window = TRUE, na.action = na.omit, ...)

Arguments

x

a matrix or data.frame of independent variables.

y

a vector of values for the dependent variable.

f

a formula

data

a data frame containing the variables in the model.

glm.family

a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See 'family' for details of family functions.)

wt

an optional vector of weights to be used.

strict

a logical indicating whether models with more likely submodels are eliminated. FALSE returns all models whose posterior model probability is within a factor of 1/OR of that of the best model.

prior.param

a vector of values specifying the prior weights for each variable.

OR

a number specifying the maximum ratio for excluding models in Occam's window

maxCol

a number specifying the maximum number of columns in design matrix (including intercept) to be kept.

OR.fix

width of the window which keeps models after the leaps approximation is done. Because the leaps and bounds gives only an approximation to BIC, there is a need to increase the window at this first "cut" so as to assure that no good models are deleted. The level of this cut is at 1/(OR^OR.fix); the default value for OR.fix is 2.

nbest

a number specifying the number of models of each size returned to bic.glm by the modified leaps algorithm.

dispersion

a logical value specifying whether dispersion should be estimated or not. Default is TRUE unless glm.family is poisson or binomial

factor.type

a logical value specifying how variables of class "factor" are handled. A factor variable with d levels is turned into (d-1) dummy variables using a treatment contrast. If factor.type = TRUE, models will contain either all or none of these dummy variables. If factor.type = FALSE, models are free to select the dummy variables independently. In this case, factor.prior.adjust determines the prior on these variables.

factor.prior.adjust

a logical value specifying whether the prior distribution on dummy variables for factors should be adjusted when factor.type=FALSE. When factor.prior.adjust=FALSE, all dummy variables for variable i have prior equal to prior.param[i]. Note that this makes the prior probability of the union of these variables much higher than prior.param[i]. Setting factor.prior.adjust=T corrects for this so that the union of the dummies equals prior.param[i] (and hence the deletion of the factor has a prior of 1-prior.param[i]). This adjustment changes the individual priors on each dummy variable to ' 1-(1-pp[i])^(1/(k+1)).

occam.window

a logical value specifying if Occam's window should be used. If set to FALSE then all models selected by the modified leaps algorithm are returned.

call

used internally

na.action

a function which indicates what should happen when data contain NAs. Possible values are na.omit, na.exclude, na.fail, na.pass or NULL.

...

unused

Details

Bayesian Model Averaging accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to approximate posterior model probability.

Value

bic.glm returns an object of class bic.glm

The function summary is used to print a summary of the results. The function plot is used to plot posterior distributions for the coefficients. The function imageplot generates an image of the models which were averaged over.

An object of class bic.glm is a list containing at least the following components:

postprob

the posterior probabilities of the models selected

deviance

the estimated model deviances

label

labels identifying the models selected

bic

values of BIC for the models

size

the number of independent variables in each of the models

which

a logical matrix with one row per model and one column per variable indicating whether that variable is in the model

probne0

the posterior probability that each variable is non-zero (in percent)

postmean

the posterior mean of each coefficient (from model averaging)

postsd

the posterior standard deviation of each coefficient (from model averaging)

condpostmean

the posterior mean of each coefficient conditional on the variable being included in the model

condpostsd

the posterior standard deviation of each coefficient conditional on the variable being included in the model

mle

matrix with one row per model and one column per variable giving the maximum likelihood estimate of each coefficient for each model

se

matrix with one row per model and one column per variable giving the standard error of each coefficient for each model

reduced

a logical indicating whether any variables were dropped before model averaging

dropped

a vector containing the names of those variables dropped before model averaging

call

the matched call that created the bma.lm object

Note

If more than maxcol variables are supplied, then bic.glm does stepwise elimination of variables until maxcol variables are reached. bic.glm handles factor variables according to the factor.type parameter. If this is true then factor variables are kept in the model or dropped in entirety. If false, then each dummy variable can be kept or dropped independently. If bic.glm is used with a formula that includes interactions between factor variables, then bic.glm will create a new factor variable to represent that interaction, and this factor variable will be kept or dropped in entirety if factor.type is true. This can create interpretation problems if any of the corresponding main effects are dropped. Many thanks to Sanford Weisberg for making source code for leaps available.

Author(s)

Chris Volinsky volinsky@research.att.com, Adrian Raftery raftery@stat.washington.edu, and Ian Painter ian.painter@gmail.com

References

Raftery, Adrian E. (1995). Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111-196, Cambridge, Mass.: Blackwells.

An earlier version, issued as Working Paper 94-12, Center for Studies in Demography and Ecology, University of Washington (1994) is available as a technical report from the Department of Statistics, University of Washington.

See Also

summary.bic.glm, print.bic.glm, plot.bic.glm

Examples


## Not run: 
### logistic regression
library("MASS")
data(birthwt)
y<- birthwt$lo
x<- data.frame(birthwt[,-1])
x$race<- as.factor(x$race)
x$ht<- (x$ht>=1)+0
x<- x[,-9]
x$smoke <- as.factor(x$smoke)
x$ptl<- as.factor(x$ptl)
x$ht <- as.factor(x$ht)
x$ui <- as.factor(x$ui)

glm.out.FT <- bic.glm(x, y, strict = FALSE, OR = 20, 
                      glm.family="binomial", factor.type=TRUE)
summary(glm.out.FT)
imageplot.bma(glm.out.FT)

glm.out.FF <- bic.glm(x, y, strict = FALSE, OR = 20, 
                      glm.family="binomial", factor.type=FALSE)
summary(glm.out.FF)
imageplot.bma(glm.out.FF)

glm.out.TT <- bic.glm(x, y, strict = TRUE, OR = 20, 
                      glm.family="binomial", factor.type=TRUE)
summary(glm.out.TT)
imageplot.bma(glm.out.TT)

glm.out.TF <- bic.glm(x, y, strict = TRUE, OR = 20, 
                      glm.family="binomial", factor.type=FALSE)
summary(glm.out.TF)
imageplot.bma(glm.out.TF)

## End(Not run)

## Not run: 
### Gamma family 
library(survival)
data(veteran)
surv.t<- veteran$time
x<- veteran[,-c(3,4)]
x$celltype<- factor(as.character(x$celltype))
sel<- veteran$status == 0
x<- x[!sel,]
surv.t<- surv.t[!sel]

glm.out.va <- bic.glm(x, y=surv.t, glm.family=Gamma(link="inverse"),
    factor.type=FALSE)
summary(glm.out.va)
imageplot.bma(glm.out.va)
plot(glm.out.va)

## End(Not run)

### Poisson family
### Yates (teeth) data. 

x<- rbind(
    c(0, 0, 0),
    c(0, 1, 0),
    c(1, 0, 0),
    c(1, 1, 1))

y<-c(4, 16, 1, 21)
n<-c(1,1,1,1)

models<- rbind(
    c(1, 1, 0),
    c(1, 1, 1))

glm.out.yates <- bic.glm( x, y, n, glm.family = poisson(), 
                         factor.type=FALSE) 
summary(glm.out.yates)

## Not run: 
### Gaussian
library(MASS)
data(UScrime)
f <- formula(log(y) ~  log(M)+So+log(Ed)+log(Po1)+log(Po2)+log(LF)+
                       log(M.F)+ log(Pop)+log(NW)+log(U1)+log(U2)+
                       log(GDP)+log(Ineq)+log(Prob)+log(Time))
glm.out.crime <- bic.glm(f, data = UScrime, glm.family = gaussian()) 
summary(glm.out.crime)
# note the problems with the estimation of the posterior standard 
# deviation (compare with bicreg example)

## End(Not run)

hanase/BMA documentation built on July 23, 2022, 3:05 a.m.