Estimate Dispersion for Poisson and Binomial GLM's and GLMM's
Description
Functions to compute an estimate of chat for binomial or Poisson GLM's and GLMM's using different estimators of overdispersion.
Usage
1 2 3 4 5 6 7 8 9 10 
Arguments
mod 
an object of class 
method 
this argument defines the estimator used. The default

... 
additional arguments passed to the function. 
Details
Poisson and binomial GLM's do not have a parameter for the variance and
it is usually held fixed to 1 (i.e., mean = variance). However, one must
check whether this assumption is appropriate by estimating the
overdispersion parameter (chat). Though one can obtain an estimate of
chat by dividing the residual deviance by the residual degrees of
freedom (i.e., method = "deviance"
), McCullagh and Nelder (1989) and
Venables and Ripley (2002) recommend using Pearson's chisquare divided
by the residual degrees of freedom (method = "pearson"
). An
estimator based on Farrington (1996) is also implemented by the function
using the argument method = "farrington"
. Recent work by
Fletcher (2012) suggests that an alternative estimator performs better
than the abovementioned methods in the presence of sparse data and is
now implemented with method = "fletcher"
. For GLMM's, only the
Pearson chisquare estimator of overdispersion is currently implemented.
Note that values of chat > 1 indicate overdispersion (variance > mean), but that values much higher than 1 (i.e., > 4) probably indicate lackoffit. In cases of moderate overdispersion, one usually multiplies the variancecovariance matrix of the estimates by chat. As a result, the SE's of the estimates are inflated (chat is also known as a variance inflation factor).
In model selection, chat should be estimated from the global model of the candidate model set and the same value of chat applied to the entire model set. Specifically, a global model is the most complex model which can be simplified to obtain all the other (nested) models of the set. When no single global model exists in the set of models considered, such as when sample size does not allow a complex model, one can estimate chat from 'subglobal' models. Here, 'subglobal' models denote models from which only a subset of the models of the candidate set can be derived. In such cases, one can use the smallest value of chat for model selection (Burnham and Anderson 2002).
Note that chat counts as an additional parameter estimated and should
be added to K. All functions in package AICcmodavg
automatically add 1 when the c.hat
argument > 1 and apply the
same value of chat for the entire model set. When c.hat > 1
,
functions compute quasilikelihood information criteria (either QAICc or
QAIC, depending on the value of the second.ord
argument) by
scaling the loglikelihood of the model by c.hat
. The value of
c.hat
can influence the ranking of the models: as chat
increases, QAIC or QAICc will favor models with fewer parameters. As an
additional check against this potential problem, one can create several
model selection tables by incrementing values of chat to assess the
model selection uncertainty. If ranking changes little up to the chat
value observed, one can be confident in making inference.
In cases of underdispersion (chat < 1), it is recommended to keep the
value of c.hat
to 1. However, note that values of chat << 1 can
also indicate lackoffit and that an alternative model (and distribution)
should be investigated.
Note that c_hat
only supports the estimation of chat for
binomial models with trials > 1 (i.e., success/trial or cbind(success,
failure) syntax) or with Poisson GLM's or GLMM's.
Value
c_hat
returns an object of class c_hat
with the estimated
chat value and an attribute for the type of estimator used.
Author(s)
Marc J. Mazerolle
References
Anderson, D. R. (2008) Modelbased Inference in the Life Sciences: a primer on evidence. Springer: New York.
Burnham, K. P., Anderson, D. R. (2002) Model Selection and Multimodel Inference: a practical informationtheoretic approach. Second edition. Springer: New York.
Burnham, K. P., Anderson, D. R. (2004) Multimodel inference: understanding AIC and BIC in model selection. Sociological Methods and Research 33, 261–304.
Farrington, C. P. (1996) On assessing goodness of fit of generalized linear models to sparse data. Journal of the Royal Statistical Society B 58, 349–360.
Fletcher, D. J. (2012) Estimating overdispersion when fitting a generalized linear model to sparse data. Biometrika 99, 230–237.
Mazerolle, M. J. (2006) Improving data analysis in herpetology: using Akaike's Information Criterion (AIC) to assess the strength of biological hypotheses. AmphibiaReptilia 27, 169–180.
McCullagh, P., Nelder, J. A. (1989) Generalized Linear Models. Second edition. Chapman and Hall: New York.
Venables, W. N., Ripley, B. D. (2002) Modern Applied Statistics with S. Second edition. Springer: New York.
See Also
AICc
, confset
, evidence
,
modavg
, importance
,
modavgPred
, mb.gof.test
, Nmix.gof.test
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34  #binomial glm example
set.seed(seed = 10)
resp < rbinom(n = 60, size = 1, prob = 0.5)
set.seed(seed = 10)
treat < as.factor(sample(c(rep(x = "m", times = 30), rep(x = "f",
times = 30))))
age < as.factor(c(rep("young", 20), rep("med", 20), rep("old", 20)))
#each invidual has its own response (n = 1)
mod1 < glm(resp ~ treat + age, family = binomial)
## Not run:
c_hat(mod1) #gives an error because model not appropriate for
##computation of chat
## End(Not run)
##computing table to summarize successes
table(resp, treat, age)
dat2 < as.data.frame(table(resp, treat, age)) #not quite what we need
data2 < data.frame(success = c(9, 4, 2, 3, 5, 2),
sex = c("f", "m", "f", "m", "f", "m"),
age = c("med", "med", "old", "old", "young",
"young"), total = c(13, 7, 10, 10, 7, 13))
data2$prop < data2$success/data2$total
data2$fail < data2$total  data2$success
##run model with success/total syntax using weights argument
mod2 < glm(prop ~ sex + age, family = binomial, weights = total,
data = data2)
c_hat(mod2)
##run model with other syntax cbind(success, fail)
mod3 < glm(cbind(success, fail) ~ sex + age, family = binomial,
data = data2)
c_hat(mod3)
