NPML estimation or Gaussian quadrature for overdispersed GLM's and variance component models
Description
Fits a random effect model using Gaussian quadrature (Hinde, 1982) or nonparametric maximum likelihood (Aitkin, 1996a).
The function alldist
is designed to account for overdispersion, while allvc
fits variance component models.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45  alldist(formula,
random = ~1,
family = gaussian(),
data,
k = 4,
random.distribution = "np",
tol = 0.5,
offset,
weights,
pluginz,
na.action,
EMmaxit = 500,
EMdev.change = 0.001,
lambda = 0,
damp = TRUE,
damp.power = 1,
spike.protect = 0,
sdev,
shape,
plot.opt = 3,
verbose = TRUE,
...)
allvc(formula,
random = ~1,
family = gaussian(),
data,
k = 4,
random.distribution = "np",
tol = 0.5,
offset,
weights,
pluginz,
na.action,
EMmaxit = 500,
EMdev.change = 0.001,
lambda=0,
damp = TRUE,
damp.power = 1,
spike.protect=0,
sdev,
shape,
plot.opt = 3,
verbose = TRUE,
...)

Arguments
formula 
a formula defining the response and the fixed effects (e.g. 
random 
a formula defining the random model. In the case of 
family 
conditional distribution of responses: "gaussian",
"poisson", "binomial", "Gamma", or "inverse.gaussian" can be set. If
"gaussian", "Gamma", or "inverse.gaussian", then equal component
dispersion parameters are assumed, except if the optional parameter

data 
the data frame (mandatory, even if it is attached to the workspace!). 
k 
the number of mass points/integration points (supported are up to 600 mass points). 
random.distribution 
the mixing distribution, Gaussian Quadrature ( 
tol 
the tol scalar (usually, 0< 
offset 
an optional offset to be included in the model. 
weights 
optional prior weights for the data. 
pluginz 
optional numerical vector of length 
na.action 
a function indicating what should happen when 
EMmaxit 
maximum number of EM iterations. 
EMdev.change 
stops EM algorithm when deviance change falls below this value. 
lambda 
only applicable for Gaussian, Gamma, and Inverse Gaussian mixtures. If set, standard
deviations/ shape parameters are calculated smoothly across components via
an AitchisonAitken kernel ( 
damp 
switches EM damping on or off. 
damp.power 
steers degree of damping applied on dispersion parameter according
to formula 
spike.protect 
for Gaussian, Gamma, and Inverse Gaussian mixtures with unequal or
smooth component standard deviations: protects algorithm to converge into likelihood spikes
by stopping the EM algorithm if one of the component standard deviations
(shape parameters, resp.), divided by the fitted mass points, falls below
(exceeds, resp.) a certain threshold, which is 
sdev 
optional; specifies standard deviation for normally distributed response. If unspecified, it will be estimated from the data. 
shape 
optional; specifies shape parameter for Gammaand IGdistributed response.
For Gamma, setting 
plot.opt 
if equal to zero, then no graphical output is given.
For 
verbose 
if set to 
... 
generic options for the 
Details
The nonparametric maximum likelihood (NPML) approach was introduced in Aitkin (1996) as a tool to fit overdispersed generalized linear models. The idea is to approximate the unknown and unspecified distribution of the random effect by a discrete mixture of exponential family densities, leading to a simple expression of the marginal likelihood which can then be maximized using a standard EM algorithm.
Aitkin (1999) extended this method to generalized linear models with shared
random effects arising through variance component or repeated measures
structure. Applications are twostage sample designs, when firstly the
primary sampling units (the upperlevel units, e.g. classes) and then the
secondary sampling units (lowerlevel units, e.g. students) are selected, or
longitudinal data. Models of this type have also been referred to as multilevel
models (Goldstein, 2003). allvc
is restricted to 2level models.
The number of components k
of the finite mixture has to be specified beforehand.
When option 'gq'
is set, then GaussHermite masses and mass points are used,
assuming implicitly a normally distributed random effect.
When option 'np'
is chosen, the EM algorithm uses the GaussHermite masses
and mass points as starting points. The position of the starting points can
be concentrated or extended by setting tol
smaller or larger than one,
respectively.
Fitting random coefficient models (Aitkin, Francis & Hinde, 2009, pp. 496, p. 514) is
possible by specifying the random term explicitly. Note that the setting
random= ~ x
gives a model with a random slope and a random
intercept, and
that only one random coefficient can be specified. The option
random.distribution
is restricted to np
in this case,
i.e. Gaussian Quadrature is not supported for random coefficient
models (see also Aitkin, Francis & Hinde (2005), page 475 bottom).
As for glm
, there are three different ways of specifying a
binomial model, namley through
a twocolumn matrix before the '~' symbol, specifying the counts of successes and nonsuccesses.
a vector of proportions of successes before the '~' symbol, and the associated number of trials provided in the
weights
argument.a twolevel factor before the '~' symbol (only for Bernoulliresponse).
The weights have to be understood as frequency weights, i.e. setting all weights in
alldist
equal to 2 will duplicate each data point and hence double the
disparity and deviance.
The Inverse Gaussian (IG) response distribution is parametrized as usual through the
mean and a scaling parameter. We refer to the latter, which is the
inverse of the dispersion parameter in exponential family formulation,
as shape
. The canonical "1/mu^2" link is supported, but it is
quite tenuous since the linear predictor is likely to become negative
after adding the random effect. The log
link behaves more
reliably for this distribution.
For k
>= 54, mass points with negligible mass
(i.e. < 1e50) are omitted. The maximum number of 'effective' mass points
is then 198.
Value
The function alldist produces an object of class glmmNPML
(if random.distributon
is set to ‘np’) or glmmGQ
(‘gq’). Both objects contain the following 29 components:
coefficients 
a named vector of coefficients (including the mass points).
In case of Gaussian quadrature, the coefficient given at 
residuals 
the difference between the true response and the empirical Bayes predictions. 
fitted.values 
the empirical Bayes predictions (Aitkin, 1996b) on the scale of the responses. 
family 
the ‘family’ object used. 
linear.predictors 
the extended linear predictors eta_ik. 
disparity 
the disparity ( 
deviance 
the deviance of the fitted mixture regression model. 
null.deviance 
the deviance for the null model (just containing an intercept), comparable with

df.residual 
the residual degrees of freedom of the fitted model (including the random part). 
df.null 
the residual degrees of freedom for the null model. 
y 
the (extended) response vector. 
call 
the matched call. 
formula 
the formula supplied. 
random 
the random term of the model formula. 
data 
the data argument. 
model 
the (extended) design matrix. 
weights 
the case weights initially supplied. 
offset 
the offset initially supplied. 
mass.points 
the fitted mass points. 
masses 
the mixture probabilities corresponding to the mass points. 
sdev 
a list of the two elements 
shape 
a list of the two elements 
rsdev 
estimated random effect standard deviation. 
post.prob 
a matrix of posteriori probabilities. 
post.int 
a vector of ‘posteriori intercepts’ (as in Sofroniou et al. (2006)). 
ebp 
the empirical Bayes Predictions on the scale of the linear predictor. For compatibility with older versions. 
EMiter 
gives the number of iterations of the EM algorithm. 
EMconverged 
logical value indicating if the EM algorithm converged. 
lastglm 
the fitted 
Misc 
contains additional information relevant for the summary and plot functions, in particular the disparity trend and the EM trajectories. 
If a binomial model is specified by giving a twocolumn response,
the weights returned by weights
are the total numbers of cases
(factored by the supplied case weights) and the component y
of the result is the proportion of successes.
As a byproduct, alldist
produces a plot showing the disparity
in dependence of the iteration number. Further, a plot with the EM trajectories
is given. The xaxis corresponds to the iteration number, and the yaxis
to the value of the mass points at a particular iteration.
This plot is not produced for GQ.
Note
In contrast to the GLIM 4 version, this R implementation
uses for Gaussian (as well Gamma and IG) mixtures by default a damping procedure in the
first cycles of the EM algorithm (Einbeck & Hinde, 2006), which stabilizes
the algorithm and makes it less sensitive to the optimal
choice of tol
. If tol
is very small
(i.e. less than 0.1), it can be useful to set damp.power
to values
larger than 1 in order to accelerate convergence.
Do not use damp.power=0
, as this would mean permanent damping during EM.
Using the option pluginz
, one can to some extent circumvent the
necessity to specify tol
by giving the starting points explicitly.
However, when using pluginz
for normal, Gamma or IG distributed response,
damping will be strictly necessary to ensure that the imposed starting points
don't get blurred immediately due to initial fluctuations, implying that
tol
still plays a role in this case.
Author(s)
Originally translated from the GLIM 4 functions alldist
and
allvc
(Aitkin & Francis, 1995) to R by Ross Darnell (2002). Modified,
extended, and prepared for publication by Jochen Einbeck & John Hinde (2006).
References
Aitkin, M. and Francis, B. (1995). Fitting overdispersed generalized linear models by nonparametric maximum likelihood. GLIM Newsletter 25, 3745.
Aitkin, M. (1996a). A general maximum likelihood analysis of overdispersion in generalized linear models. Statistics and Computing 6, 251262.
Aitkin, M. (1996b). Empirical Bayes shrinkage using posterior random effect means from nonparametric maximum likelihood estimation in general random effect models. Statistical Modelling: Proceedings of the 11th IWSM 1996, 8794.
Aitkin, M. (1999). A general maximum likelihood analysis of variance components in generalized linear models. Biometrics 55, 117128.
Aitkin, M., Francis, B. and Hinde, J. (2009). Statistical Modelling in R. Oxford Statistical Science Series, Oxford, UK.
Einbeck, J. & Hinde, J. (2006). A note on NPML estimation for exponential family regression models with unspecified dispersion parameter. Austrian Journal of Statistics 35, 233243.
Goldstein, H. (2003). Multilevel Statistical Models (3rd edition). Arnold, London, UK.
Hinde, J. (1982). Compound Poisson regression models. Lecture Notes in Statistics 14, 109121.
Sofroniou, N., Einbeck, J., and Hinde, J. (2006). Analyzing Irish suicide rates with mixture models. Proceedings of the 21st International Workshop on Statistical Modelling in Galway, Ireland, 2006.
See Also
glm
, summary.glmmNPML
,
predict.glmmNPML
family.glmmNPML
,
plot.glmmNPML
.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70  # The first three examples (galaxy data, toxoplasmosis data , fabric faults)
# are based on GLIM examples in Aitkin et al. (2005), and the forth example using
# the HospitalStayData (Rosner, 2000) is taken from Einbeck & Hinde (2006).
# The fifth data example using the Oxford boys is again inspired by Aitkin et al. (2005).
# The sixth example on Irish suicide rates is taken from Sofroniou et al. (2006).
# The galaxy data
data(galaxies, package="MASS")
gal<as.data.frame(galaxies)
galaxy.np6 < alldist(galaxies/1000~1, random=~1, random.distribution="np",
data=gal, k=6)
galaxy.np8u < alldist(galaxies/1000~1, random=~1, random.distribution="np",
data=gal, k=8, lambda=0.99)
round(galaxy.np8u$sdev$sdevk, digits=3)
# [1] 0.912 0.435 0.220 0.675 1.214 0.264 0.413 0.297
# The toxoplasmosis data
data(rainfall, package="forward")
rainfall$x<rainfall$Rain/1000
rainfall$x2< rainfall$x^2; rainfall$x3< rainfall$x^3
toxo.np3< alldist(cbind(Cases,TotalCases) ~ x+x2+x3, random=~1,
random.distribution="np", family=binomial(link=logit), data=rainfall, k=3)
toxo.np3x< alldist(cbind(Cases,TotalCases) ~ x, random=~x,
random.distribution="np", family=binomial(link=logit), data=rainfall, k=3)
# is the same as
toxo.np3x< alldist(Cases/Total ~ x, random = ~x, weights=Total,
family=binomial(link=logit), data=rainfall, k=3)
# or
toxo.np3x<update(toxo.np3, .~.x2x3, random = ~x)
# The fabric faults data
data(fabric)
coefficients(alldist(y ~ x, random=~1, family=poisson(link=log),
random.distribution="gq", data= fabric, k=3, verbose=FALSE))
# (Intercept) x z
# 3.3088663 0.8488060 0.3574909
# The Pennsylvanian hospital stay data
data(hosp)
fitnp3< alldist(duration~age+temp1, data=hosp, k=3, family=Gamma(link=log),
tol=0.5)
fitnp3$shape$shape
# [1] 50.76636
fitnp3< alldist(duration~age+temp1, data=hosp, k=3, family=Gamma(link=log),
tol=0.5, lambda=0.9)
fitnp3$shape$shapek
# [1] 49.03101 42.79522 126.64077
# The Oxford boys data
data(Oxboys, package="nlme")
Oxboys$boy < gl(26,9)
allvc(height~age, random=~1boy, data=Oxboys, random.distribution='gq', k=20)
allvc(height~age, random=~1boy, data=Oxboys,random.distribution='np',k=8)
# with random coefficients:
allvc(height~age,random=~ageboy, data=Oxboys, random.distribution='np', k=8)
# Irish suicide data
data(irlsuicide)
# Crude rate model:
crude< allvc(death~sex* age, random=~1ID, offset=log(pop),
k=3, data=irlsuicide, family=poisson)
crude$disparity
# [1] 654.021
# Relative risk model:
relrisk< allvc(death~1, random=~1ID, offset=log(expected),
k=3, data=irlsuicide, family=poisson)
relrisk$disparity
# [1] 656.4955
