Introduction

The glmbayes package produces iid samples for Bayesian Linear Models and Bayesian Genereralized Linear Models by providing Bayesian versions of the lmb and glm functions for classical models. Extensive efforts have been made to make the new functions, lmb and glmb, as similar as possible to their classical cousins. To that end, the functions add a single required argument (pfamily) and one optional argument (n) to those typically used by the classical functions. The Bayesian functions also generally inherit and/or provide implemented methods for nearly all of the methods available for the lm and glm functions so that the same type of outputs as from the classical functions can be readily produced.

This vignette is designed to provide users a detailed overview of the package and the various functionalities available. In the rest of this introduction, we briefly touch on the basics of loading the package and accessings various help resources and demos (which we recommend as a next step for users after they read this vignette). The next section then gives an overview of all the core functions that most users are likely to use. This includes the two aforementioned functions, functions used to specify and check prior distributions, and methods available for the two functions. Most users can likely stop after that section and can then turn to either the help pages or the next set of vignettes that go into a more detailed comparisons of how the methods are used for these Bayesian functions vs. for their classical cousins.

For advanced users who are interested in implementing Block-Gibbs sampling procedures or in doing simulation for other more complex models, we recommend continuing on to the third section in this vignette that covers the two functions rlmb and rglmb and are designed for implementation of precisely that type of sampling. The third section discusses how those functions can be called, references some examples/demos where they are used, and describes how to use the output from the functions to access similar output to that produced by the lmb and glmb functions.

For users who are interested in the details of how the simulation is performed there is the final section that contains a discussion of the various simulation functions that are used to sample from the posterior distributions. This includes a discussion of three core function that simuate for the currently implemented prior families and a couple of functions related to what we call the Central Normal distribution (the normal distribution between a lower and an upper bound). It also includes a discussion of a set of functions that are used in order to utilize the enveloping functions needed to get iid samples for some of the posterior distributions. That material is rather technical and we refer the user to the Nygren and Nygren (2006) paper that forms the foundation for the simulation procedure.

Loading the package

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The package curently requires the MASS package so it also needs to be installed and gets loaded as part of the loading of the glmbayes package (see below).

library(glmbayes)

Accessing package information, general documentation, and demos

To access information for the package outside of the information contained in these vignettes, we suggest users review three sets of help pages. A call to help(glmbayes) yields a page with a detailed discussion of the package with an example at the bottom. Calling help(package="glmbayes") brings the user to a list of Help pages for specific functions as well as links to a general description file, code demos, and these vignettes. Part of the objective of this vignette is to provide more organized layout of the functions than the general list of help pages provided on the documentation page.

A call to demo(package="glmbayes"), finally, brings up a list of available demos for the package. A useful start for users might be to go through the specific demos as they are set up to illustrate the functionality of the package.

Using the lmb and glmb functions and their methods

As noted above, the lmb and glmb functions are the two core functions in this package. More detailed side-by-side comparisons of the use of methods for these functions and their classical cousins are provided in the next two vignettes. Here we provide a brief overview of supporting functions available to facilitate the use of the core functions. In the first subsection, we show how the core functions are called (and how this compares to calls for the classical functions). We then give a brief discussion of how priors are specified and some related functions designed to assist with that task as well as with the task of viewiw prior specifications after the modeling is completed.

The lmb and glmb functions

The code snippets below illustrates how model specification for these models can be done by simply providind an additional argument, a pfamily, that provides information on the type of prior distribution that has been selected and the parameter values that are desired. A separate vignette is provided to give guidance to users on how to select appropriate prior parameters (and some standardization steps that may be useful as part of that).

[NEED TO MODIFY Prior_Setup PRINT FUNCTION AND HOW IT HANDLES CASE WHERE X IS MISSING] [MAY ALSO WANT TO HAVE IT INITIALIZE SHAPE AND RATE PARAMETERS]

Calling lm vs. calling lmb

Here is the call for the lmb function and how it compares to the call to the classical lm function. The recommended prior here is the dNormal_Gamma pfamily which requires four parameters (2 for he Normal component, and 2 for the Gamma component).

# Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
##Classical model-call
lm.D9 <- lm(weight ~ group)

# Bayesian Prior Setup and Call-Prior Should Generally be Customized
p_info=Prior_Setup(weight~group)
mu1=p_info$mu    
Sigma1=p_info$Sigma

#lmb.D9 <- lmb(weight ~ group,pfamily=dNormal_Gamma(mu1,Sigma1,shape=4,rate=0.1))

Calling glm vs. calling glmb

The call for the glmb function is quite similar and is here compared to the corresponding call for the glm function. For the binomial and Poisson families, the recommended prior here is the dNormal pfamily which requires 2 parameters (a prior mean vector and a prior Variance-Covariance matrix).

## Dobson (1990) Page 93: Randomized Controlled Trial :
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
##Classical model-call
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())

# Bayesian Prior Setup and Call-Prior Should Generally be Customized
p_info2=Prior_Setup(counts ~ outcome + treatment)
mu2=p_info2$mu    
Sigma2=p_info2$Sigma

glmb.D93<-glmb(counts ~ outcome + treatment, family = poisson(),pfamily=dNormal(mu=mu2,Sigma=Sigma2))

Specifying Priors

Prior Family Functions

The pfamily arguments described above are actually functions that provide needed information to the lmb and glmb functions. Since they are are functions, they can also be called directly, as seen below, where they return lists with items related to the prior (see documentation of these pfamily functions for details). These functions are also discussed in more detail in a latter chapter.

While the current package only provides 3 implemented pfamiles, this is actually not a severe limitations as users easily can write their own prior families and pass those to the lmb and glmb functions. One of our exaxmple includes an illustration of this and this approach is discussed more extensively in one of our other vignettes. Below, we simply illustrate how the functions can be called directly.

NOTE: IMPLEMENT THE EXAMPLE DISCUSSED ABOVE AND WRITE A SEPARATE VIGNETTE TO ACCOMPANY THE EXAMPLE.

Calling the dNormal prior family

This prior is typically used as a prior for the Poisson and Binomial families when calling either the glmb or rglm functions. Calling it directly provides a list with, among other items, the prior specification. the documentation for the pfamilies provides additional information on what the returned list contains and what required and optional arguments are.

dNormal(mu=mu2,Sigma=Sigma2,dispersion=NULL)

*Calling the dNormal_Gamma prior family

This prior is typically used as a prior when calling either the lmb or rglmb functions or when using the gaussian() family in the glm and rglmb functions. Calling it directly

dNormal_Gamma(mu=mu1,Sigma=Sigma1,shape=4,rate=0.1)

*Calling the dGamma prior family

This prior is typically used when giving a prior for the dispersion parameter in the lmb, rlmb, glmb, and rglmb functions (in the latter case for the gaussian() and Gamma() families). Typicall this would be done as part of Block-Gibbs sampling where the regression coefficients are updated in a separate block.

b=lm.D9$coefficients
dGamma(shape=4,rate=0.1,beta=b)

Other Prior Related Functions

Three additional functions relate to the prior specification and are useful when either specifying the prior or when extracting the information about the prior used by an existing model object.

*The Prior_Setup function

This function can be used to set up the prior arguments so they have the correct dimensions and so that the various dimensions have names that correspond to those in the model. In addition to initialized prior structrues, these functions also return the model frame and the model matrix as a reference (the latter when it was requested from the underlying model). It is worth noting that this function is intended as a tool for initializig the prior objects and not for setting them to the final values used in the estimation. How to populate the arguments with sensible values is discussed in a separate vignette.

NOTE: ADD A PRINT METHOD FOR THIS FUNCTION SO THAT THE OUTPUT CAN BE CONTROLLED

Prior_Setup(weight ~ group)

*The Prior_Check function

The Prior Check function provides a utility for checking whether the data appears to be consistent with the prior. Essentially, consistency here means that the maximum likelihood estimate are within the credible intervals implied by the prior. If the data appears inconsistent with the data, it is possible that the data was generated from a different process than that implied by the model or that the prior was not well thought out. Specifying a good prior is usually important and we dedicate a separate vignette to this topic.

glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())


Prior_Check(counts ~ outcome + treatment, family = poisson(),dNormal(mu2,Sigma2))

*The pfamily function

This function can be used to extract a pfamily object from an existing object and is useful post-modeling in order to verify the prior used in the modeling process. It is also used by some of the methods during post-processing to extract information about the prior.

pfamily(glmb.D93)

Method functions

The lm and glm functions come with an extensive set of method functions that can be used to conduct supplemental analysis (typically during the post-modeling phase). Because the lmb and glmb functions are structured to return many of the same items as the classical functions and because they inherit methods from glm and lm, many of the methods available for the classical functions are directly available for the glmb and lmb functions. As some of the output does tend to be different (in particular, coefficients are returned as random draws as opposed to a single maximum likelihood estimate), we have implemented specific methods for the "glmb" and "lmb" classes of objects. The current set of methods for all four classes of objects are displayed below.

Two separate vignettes illustrate how calls to these methods can be used to replicate the output from the classical models. It is worth noting that a smaller subset of the classical methods are not yet implemented (although plans are in place to replicate some of those as well). Those are discussed in the separate vignette's as well. Here we provide lists of the methods available for classical lm and glm functions (many of which are inherited by lmb and glmb) as well as methods that have been developed specifically for glmb and rglmb.

[NOTE: AS NEEDED, ADD ADDITIONAL METHODS FOR lmb. THE LIST OF METHODS FOR glmb SHOULD BE FAIRLY COMPLETE]

Methods for class="lm"

methods(class="lm")

Methods for class="glm"

methods(class="glm")

Methods for class="glmb"

While most of the methods for the glm class (when it has its own method) or the lm class (when it does not) work properly for objects of class glmb, some methods do not. For that reason, we have implemented the below methods specifically for the glmb class of objects. Future enhancements to this package may enhance these methods. There are also additional methods for lm and/or glm that do not currently have functioning methods for the glmb class. This includes the add1, drop1, and dropterm functions (which should be implementable fairly easily) and the various influence.measure methods (influence, rstandard, rstudent, dfbeta, dfbetas, cooks.distance, and hatvalues) which may require more work. Plans do exist to add the former and perhaps the latter (which likely would look at influence on posterior modes and not on posterior means).

methods(class="glmb")

NOTE: CHECK WHICH METHODS FOR lmb that are inherited and functions vs. those that may require additional development.

methods(class="lmb")

The rlmb and rglmb functions and their summary functions

Because the functions lmb and glmb are set up to mirror their corresponding classical cousins, they include a great deal of overhead for pre and post processing that are unsuitable if the functions need to be called repetitively (say during Block-Gibbs sampling). To faciliate Block-Gibbs sampling and other simulation, we provide more minimalistic interfaces to the simulation procedures in the form of the corresponding functions rlmb and rglmb.

These functions are called internally by lmb and glmb in much the same way that lm.fit and glm.fit are called by the lm and glm functions so in a way they are Bayesian versions of those functions. We choose the r prefix for these functions to symbolize that they also represent random draws from probability distributions in much the same way that rnorm, rgamma, and other functions in the stats package do.

Calling the rlmb and rglmb functions

Instead of passing a formula to these functions, the user instead passes the dependent and independent variables as a dependent variable vector y and as an independent variable design matrix x respectively. A useful approach that can help the user specify the design matrix involves sequential calls to model.frame and model.matrix as seen below.

NOTE 1: SHORTEN THIS CODE NOTE 2: The SUMMARY FUNCTION IS CURRENTLY NOT WORKING FOR rlmb

Calling the rlmb function

## Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
## Classical call
lm.D9 <- lm(weight ~ group,x=TRUE,y=TRUE)
lm_summary=summary(lm.D9)
## Bayesian Call
n<-10000
y<-lm.D9$y
#x<-as.matrix(lm.D9$x)

### Set old regression and dispersion coefficients

b_old=lm.D9$coefficients
v_old=lm_summary$sigma^2

#### Set up for rglmb_dispersion

n0=0.1
shape=n0/2
rate=shape*v_old

rate/(shape-1)
rate/shape

#a1<-shape+n1/2
#b1<-rate+sum(SS)/2

#out<-1/rgamma(n,shape=a1,rate=b1)
## v0=sum(SS)/n1 ~ (2*rate+sum(SS))/(2*shape+n1)=[(2*rate+sum(SS))/2]/((2*shape+n1)/2) 
n1=length(y)
SS=v_old*n1

n1 # This is equal to 20


n0=2 # Prior observations
v_prior=v_old  # Prior point estimate for variance (the mean of (1/dispersion=1/v_prior))
wt0=(n0/n1)  

## set shape=0.01*(n1/2)
## set rate= 0.01*SS/2

shape=wt0*(n1/2)   ###  Shape is prior observations /2
rate=shape*v_prior  ### rate is essentiall prior SS - V in rmultireg should be this
rate/shape ## Should match v_prior (currently also v_old)

mu<-c(0,0)
mu=b_old  ### For testing purposes, set prior=b_old
P<-0.1*diag(2)
# Bayesian Prior Setup and Call-Prior Should Generally be Customized
p_info=Prior_Setup(weight~group)
x=p_info$x
outtemp4=rlmb(n=1000,y=y,x=x,pfamily=dNormal_Gamma(mu=mu,Sigma=solve(P),shape=shape,rate=rate))

#summary(outtemp4)

Calling the rglmb function

data(menarche2)
Age2=menarche2$Age-13

y=menarche2$Menarche/menarche2$Total
wt=menarche2$Total

## Use model.frame and model.matrix to derive x
#mf=model.frame(formula)
#x=model.matrix(formula,mf)

x<-matrix(as.numeric(1.0),nrow=length(Age2),ncol=2)
x[,2]=Age2

## Modify Prior_Setup so it can take a model matrix as well as an model object
mu<-matrix(as.numeric(0.0),nrow=2,ncol=1)
V1<-1*diag(as.numeric(2.0))
mu[2,1]=(log(0.9/0.1)-log(0.5/0.5))/3

# 2 standard deviations for prior estimate at age 13 between 0.1 and 0.9
## Specifies uncertainty around the point estimates

V1[1,1]<-((log(0.9/0.1)-log(0.5/0.5))/2)^2 
V1[2,2]=(3*mu[2,1]/2)^2  # Allows slope to be up to 3 times as large as point estimate 

out<-rglmb(n = 1000, y=y, x=x, pfamily=dNormal(mu=mu,Sigma=V1), weights = wt, 
           family = binomial(logit)) 

summary(out)

Methods for The rlmb, rglmb, and their summary functions

To allow for the output from the rglmb and rlmb functions to benefit from the wide range of methods available for their lmb and glmb cousins, we implement a few additional methods for these functions that enable most of the methods to also work for these objects post modeling.

methods(class="rglmb")
methods(class="summary.rglmb")
methods(class="rlmb")

Advanced Topics

In this section, we cover additional functions included in the package that most users likely would not call directly but which provide visibility to the simulation methods utilized as part of the package. These functions can be broadly broken down into four distinct categories:.

Simulation Functions

One of the key items returned by the pfamily functions is an assigned simulation function to be called during the estimation process (one for each pfamily). These functions can also be called directly (though we don't recommend it). The Function documentation for each of these underlying functions contains information on the type of algorihm used to generate the sample. We refer the reader to the documentation for each of these functions but note the following relationship:

Normal_ct Functions

For estimation of Bayesian generalized linear models with Normal Priors but non-gaussian families, accept-reject procedures are used where candidates are generated from mixtures of restricted multivariate normal distributions. Two functions for univariate restricted normal distributions (rnorm_ct and pnorm_ct) play a key role during this process and can be viewed as generalizations of the rnorm and pnorm functions.

Envelope Related Functions

For estimation of Bayesian generalized linear models with Normal Priors but with non-gaussian families, there are a number of functions that are used in order to generate the samples and that all in one-way-or another are related to the Enveloping functions used. The below is a list of these functions (we refer the reader to the respective function documentation for details).

Utility Functions

Finally, the package contains a small set of functions that are used by various method functions. While these are currently exported, we may in the future choose not to export them as they are unlikely to be useful for users of the package as standalone functions.



knygren/glmbayes documentation built on Sept. 4, 2020, 4:39 p.m.