Description Usage Arguments Details Value Note Author(s) References See Also Examples
Fits a generalized linear model with a linklinear model for the dispersion as well as for the mean.
1 2 3 4 5 6  dglm(formula=formula(data), dformula = ~ 1, family = stats::gaussian, dlink = "log",
data = sys.parent(), subset = NULL, weights = NULL, contrasts = NULL,
method = "ml", mustart = NULL, betastart = NULL, etastart = NULL, phistart = NULL,
control = dglm.control(...), ykeep = TRUE, xkeep = FALSE, zkeep = FALSE, ...)
dglm.constant(y,family,weights=1)

formula 
a symbolic description of the model to be fit.
The details of model specification are found in 
dformula 
a formula expression of the form

family 
a description of the error distribution and link function to
be used in the model.
See 
dlink 
link function for modelling the dispersion.
Any link function accepted by the 
data 
an optional data frame containing the variables in the model.
See 
subset 
an optional vector specifying a subset of observations to be used in the fitting process. 
weights 
an optional vector of weights to be used in the fitting process. 
contrasts 
an optional list. See the 
method 
the method used to estimate the dispersion parameters;
the default is 
mustart 
numeric vector giving starting values for the fitted values
or expected responses.
Must be of the same length as the response,
or of length 1 if a constant starting vector is desired.
Ignored if 
betastart 
numeric vector giving starting values for the regression coefficients in the linklinear model for the mean. 
etastart 
numeric vector giving starting values for the linear predictor for the mean model. 
phistart 
numeric vector giving starting values for the dispersion parameters. 
control 
a list of iteration and algorithmic constants.
See 
ykeep 
logical flag: if 
xkeep 
logical flag: if 
zkeep 
logical flag: if 
... 
further arguments passed to or from other methods. 
y 
numeric response vector 
Write m_i = E(y_i) for the expectation of the
ith response.
Then Var(y_i) = s_iV(m_i) where V
is the variance function and s_i is the dispersion of the
ith response
(often denoted as the Greek character ‘phi’).
We assume the link linear models
g(m_i) = x_i^T b and
h(s_i) = z_i^T a,
where x_i and z_i are vectors of covariates,
and b and a are vectors of regression
cofficients affecting the mean and dispersion respectively.
The argument dlink
specifies h.
See family
for how to specify g.
The optional arguments mustart
, betastart
and phistart
specify starting values for m_i, b
and s_i respectively.
The parameters b are estimated as for an ordinary glm. The parameters a are estimated by way of a dual glm in which the deviance components of the ordinary glm appear as responses. The estimation procedure alternates between one iteration for the mean submodel and one iteration for the dispersion submodel until overall convergence.
The output from dglm
, out
say, consists of two glm
objects
(that for the dispersion submodel is out$dispersion.fit
) with a few more
components for the outer iteration and overall likelihood.
The summary
and anova
functions have special methods for dglm
objects.
Any generic function which has methods for glm
s or lm
s will work on
out
, giving information about the mean submodel.
Information about the dispersion submodel can be obtained by using
out$dispersion.fit
as argument rather than out itself.
In particular drop1(out,scale=1)
gives correct score statistics for
removing terms from the mean submodel,
while drop1(out$dispersion.fit,scale=2)
gives correct score
statistics for removing terms from the dispersion submodel.
The dispersion submodel is treated as a gamma family unless the original
reponses are gamma, in which case the dispersion submodel is digamma.
(Note that the digamma and trigamma functions are required to fit a digamma
family.) This is exact if the original glm family is gaussian
,
Gamma
or inverse.gaussian
. In other cases it can be
justified by the saddlepoint approximation to the density of the responses.
The results will therefore be close to exact ML or REML when the dispersions
are small compared to the means. In all cases the dispersion submodel as prior
weights 1, and has its own dispersion parameter which is 2.
an object of class dglm
is returned,
which inherits from glm
and lm
.
See dglm.object
for details.
The anova method is questionable when applied to an dglm
object with
method="reml"
(stick to method="ml"
).
Gordon Smyth, ported to R\ by Peter Dunn (pdunn2@usc.edu.au)
Smyth, G. K. (1989). Generalized linear models with varying dispersion. J. R. Statist. Soc. B, 51, 47–60.
Smyth, G. K., and Verbyla, A. P. (1999). Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics, 10, 696709.
Verbyla, A. P., and Smyth, G. K. (1998). Double generalized linear models: approximate residual maximum likelihood and diagnostics. Research Report, Department of Statistics, University of Adelaide.
dglm.object
, dglm.control
,
Digamma family
, Polygamma
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27  # Continuing the example from glm, but this time try
# fitting a Gamma double generalized linear model also.
library(statmod)
clotting < data.frame(
u = c(5,10,15,20,30,40,60,80,100),
lot1 = c(118,58,42,35,27,25,21,19,18),
lot2 = c(69,35,26,21,18,16,13,12,12))
# The same example as in glm: the dispersion is modelled as constant
# However, dglm uses ml not reml, so the results are slightly different:
out < dglm(lot1 ~ log(u), ~1, data=clotting, family=Gamma)
summary(out)
# Try a double glm
out2 < dglm(lot1 ~ log(u), ~u, data=clotting, family=Gamma)
summary(out2)
anova(out2)
# Summarize the mean model as for a glm
summary.glm(out2)
# Summarize the dispersion model as for a glm
summary(out2$dispersion.fit)
# Examine goodness of fit of dispersion model by plotting residuals
plot(fitted(out2$dispersion.fit),residuals(out2$dispersion.fit))

Loading required package: statmod
Call: dglm(formula = lot1 ~ log(u), dformula = ~1, family = Gamma,
data = clotting)
Mean Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 0.01655438 0.0009275491 17.84744 4.279230e07
log(u) 0.01534311 0.0004149596 36.97496 2.751191e09
(Dispersion Parameters for Gamma family estimated as below )
Scaled Null Deviance: 1890.363 on 8 degrees of freedom
Scaled Residual Deviance: 9.002787 on 7 degrees of freedom
Dispersion Coefficients:
Estimate Std. Error z value Pr(>z)
(Intercept) 6.288103 0.4712586 13.34321 1.297468e40
(Dispersion parameter for Digamma family taken to be 2 )
Scaled Null Deviance: 8.90448 on 8 degrees of freedom
Scaled Residual Deviance: 8.90448 on 8 degrees of freedom
Minus Twice the LogLikelihood: 31.98992
Number of Alternating Iterations: 4
Call: dglm(formula = lot1 ~ log(u), dformula = ~u, family = Gamma,
data = clotting)
Mean Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 0.01784797 0.0010062108 17.73780 4.464149e07
log(u) 0.01596262 0.0002301215 69.36604 3.402379e11
(Dispersion Parameters for Gamma family estimated as below )
Scaled Null Deviance: 2313.573 on 8 degrees of freedom
Scaled Residual Deviance: 9.003391 on 7 degrees of freedom
Dispersion Coefficients:
Estimate Std. Error z value Pr(>z)
(Intercept) 4.59256962 0.76357166 6.014589 1.803438e09
u 0.06966577 0.01502817 4.635680 3.557663e06
(Dispersion parameter for Digamma family taken to be 2 )
Scaled Null Deviance: 16.75853 on 8 degrees of freedom
Scaled Residual Deviance: 4.414477 on 7 degrees of freedom
Minus Twice the LogLikelihood: 22.17126
Number of Alternating Iterations: 5
Analysis of Deviance Table
Gamma double generalized linear model
Response: lot1
DF Seq.Chisq Seq.P Adj.Chisq Adj.P
Mean model 1 48.686 0.0000000 47.403 0.0000000
Dispersion model 1 9.819 0.0017275 9.819 0.0017275
Call:
dglm(formula = lot1 ~ log(u), dformula = ~u, family = Gamma,
data = clotting)
Deviance Residuals:
Min 1Q Median 3Q Max
0.076478 0.010122 0.001926 0.048233 0.093677
Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 0.0178480 0.0010062 17.74 4.46e07 ***
log(u) 0.0159626 0.0002301 69.37 3.40e11 ***

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Gamma family taken to be 1.307633)
Null deviance: 2313.5733 on 8 degrees of freedom
Residual deviance: 9.0034 on 7 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5
Call:
dglm(formula = ~u, family = Digamma(link = "log"), data = clotting)
Deviance Residuals:
Min 1Q Median 3Q Max
2.18708 0.81757 0.07655 0.16104 1.16959
Coefficients:
Estimate Std. Error t value Pr(>t)
(Intercept) 4.59257 0.53790 8.538 6e05 ***
u 0.06967 0.01059 6.581 0.00031 ***

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for Digamma family taken to be 0.9925192)
Null deviance: 16.7585 on 7 degrees of freedom
Residual deviance: 4.4145 on 7 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.