lme4
Up to now/previously, "deviance" has been defined in lme4 as -2*(log likelihood). In the tweaked development version used here, I've redefined it as the sum of the (squared) deviance residuals.
Fit two basic models:
library(lme4) gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd), data = cbpp, family = binomial) gm2 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd), data = cbpp, family = binomial, nAGQ = 2)
Various summaries:
deviance(gm1) -2*logLik(gm1) dd <- update(gm1,devFunOnly=TRUE) dd(unlist(getME(gm1,c("theta","beta"))))
The "deviance function" actually returns $-2L$, not the deviance.
Create a new version with $\theta$ set to zero (to check match with glm()
likelihood, deviance, etc.):
## wrap deviance function: force $\theta=0$ dd2 <- function(p) { dd(c(0,p)) } (dev0 <- (opt0 <- optim(fn=dd2,par=getME(gm1,"beta")))$val) mm2 <- mm <- getCall(gm1) mm[[1]] <- quote(glFormula) ff <- eval(mm) ## auxiliary information opt0$par <- c(0,opt0$par) ## have to convert names of results to make lme4 happy names(opt0)[match(c("convergence","value"),names(opt0))] <- c("conv","fval") gmb <- with(ff,mkMerMod(environment(dd),opt0,reTrms,fr,mm2))
And with glm
:
gm0<- glm(cbind(incidence, size - incidence) ~ period, data = cbpp, family = binomial) all.equal(deviance(gm0),deviance(gmb)) ## deviances match (d0 <- c(-2*logLik(gm0))) all.equal(dev0,d0) ## 'deviance function' matches -2L ## deviance residuals match all.equal(residuals(gmb),residuals(gm0),tol=1e-3)
I'm sure this has been gone over a million times before, but let's review the relationships between the deviance and the log-likelihood as defined in base R (i.e. glm
):
glm.fit
, the deviance is defined as the sum of the deviance residuals (i.e. sum(dev.resids(y, mu, weights))
), which in turn are defined in binomial()
(or the other results from family()
) as the squared deviance residuals ... it is then stored in the $deviance
element of the list.## access dev. resids built into binomial(): devres2 <- with(cbpp, binomial()$dev.resid(y=incidence/size,mu=predict(gm0,type="response"), wt=size)) all.equal(devres2,unname(residuals(gm0,"deviance")^2))
(binomial()$dev.resid()
calls the internal C_binomial_dev_resids
function, which computes $2 w_i (y \log(y/\mu) + (1-y) \log((1-y)/(1-\mu)))$ \ldots) ... this is the same as the binomialDist::devResid
defined in glmFamily.cpp
...
In logLik.glm
, the log-likelihood is retrieved (weirdly, as has been pointed out before) from the $aic
component of the fitted object (by computing ${\cal L}=p-\mbox{AIC}/2$), which is in turn computed directly, e.g. for binomial it is
-2 * sum(ifelse(m > 0, (wt/m), 0) * dbinom(round(m * y), round(m), mu, log = TRUE))
(that is, the $aic
function really computes -2*the log likelihood --
AIC.default
uses -2 * val$ll + k * val$df
) However, it gets worse
for families with a scale parameter, for which the $aic
component returns $-2{\cal L} + 2$.
The additional $2$ presumably accounts for the scale parameter,
e.g. for the gaussian
family we have
n <- 10 yy <- rnorm(n) mu <- rnorm(n) ss <- sum((yy-mu)^2) all.equal(gaussian()$aic(yy, 1, mu, 1, ss), -2 * sum(dnorm(yy, mu, sqrt(ss/n), log = TRUE)) + 2)
Note also the added two at the end of the definition of Gamma()$aic
,
which is
-2 * sum(dgamma(y, 1/disp, scale = mu * disp, log = TRUE) * wt) + 2
aic2 <- with(cbpp, binomial()$aic(y=incidence/size,n=1,mu=predict(gm0,type="response"), wt=size)) all.equal(aic2,c(-2*logLik(gm0)))
But glm.fit
takes the results of binomial()$aic()
(which is $-2L$) and converts it to
aic.model <- aic(y, n, mu, weights, dev) + 2 * rank
In summary, in base R:
family()$aic
computes $-2L$, which glm.fit
translates to an AIC by adding $2k$ and storing it in model$aic
logLik.default
retrieves model$aic
and converts it back to a log-likelihoodstats:::AIC.default
retrieves the log-likelihood and converts it back to an AIC (!)family()$dev.resid()
computes the squared deviance residualsstats:::residuals.glm
retrieves these values and takes the signed square rootIn lme4
:
logLik
computes the log-likelihood from scratch based on the weighted penalized residual sum of squares;residuals.merMod
(which calls residuals.lmResp
or residuals.glmResp
) calls the $devResid()
method, which calls the interval glm_devResid
function to return the squared deviance residuals, and finds the signed square root.Bottom line: for GLMMs, I have changed the deviance()
method to return the sum of squares of the deviance residuals, rather than $-2L$. Now let's see what breaks ...
As discussed above, even in GLMs there are questions about how to define deviance. For example is it just minus twice the log-likelihood or does it involve subtracting the deviance for the saturated model? With GLMMs there is the additional distinction between marginal and conditional deviance.
We represent the conditional log-likelihood as $\log p_{\theta, \beta}(y | u)$ and the unconditional as $\log p_{\theta, \beta}(y)$. The exact relation between the two is $p_{\theta, \beta}(y) = \int p_{\theta, \beta}(y | u) p(u)$, where $p(u)$ is an independent multivariate normal. For canonical links, the Laplace approximation to the marginal log-likelihood is $\log (p_{\theta, \beta}(y | u)) - \frac{1}{2}\|u\|^2 - |L_{\theta}|$ (nb that this is technically an approximate Laplace approximation in the non-canonical link case). Here are several ways to compute this approximate marginal log-likelihood:
logLik(gm1) # which calls ... -0.5 * lme4:::devCrit(gm1) # which calls ... -0.5 * gm1@devcomp$cmp["dev"] # which is computed from something like ... -0.5 * dd(gm1@optinfo$val) # which is computed from something like ... -0.5 * (gm1@resp$aic() + gm1@pp$sqrL(1) + gm1@pp$ldL2()) # which is wrapped into ... lme4ord:::laplace(gm1) # which by the family$aic property discussed above is also given by ... sum(with(cbpp, dbinom(incidence, size, fitted(gm1), log = TRUE))) - 0.5 * (gm1@pp$sqrL(1) + gm1@pp$ldL2())
The explanation of these results is that -0.5 * object@resp$aic()
gives the conditional log-likelihood, $\log p_{\theta, \beta}(y | u)$, which is why this term can also be obtained with dbinom
in the above example. The Laplace approximation correction terms for converting a conditional log-likelihood into a marginal log-likelihood are gm1@pp$sqrL(1)
and gm1@pp$ldL2())
.
For reasons discussed above, it may be confusing that another log-likelihood synonym is not,
-0.5 * (deviance(gm1) + gm1@pp$sqrL(1) + gm1@pp$ldL2())
The reason why is that deviance
returns the conditional model deviance minus the conditional saturated deviance.
In summary, on the deviance scale, we have:
| relative to saturated | conditional | marginal |
| --------------------- | ----------- | -------- |
| yes | deviance(object)
| ???? |
| no | object@resp$aic()
| -2 * logLik(object)
|
Considering this table helps to identify several inconsistencies with terminology. For example, the deviance reported in the print.merMod
method is actually reporting -2 * logLik(object)
as opposed to deviance(object)
:
lme4:::getLlikAIC(gm1)$AICtab["deviance"] deviance(gm1)
On one hand it makes sense to give a marginal deviance in the print
method because marginal AIC
is used there too:
lme4:::getLlikAIC(gm1)$AICtab["AIC"] AIC(gm1) lme4:::getLlikAIC(gm1)$AICtab["deviance"] + 2 * length(gm1@optinfo$val)
On the other hand, as discussed above, deviance usually is defined relative to the saturated model. So arguably what we need in the print
method is a marginal version of the deviance relative to the saturated model (see the ????'s in the table above).
Recall that gm1
and gm2
were fitted with Laplace approximation and nAGQ = 2
respectively. These two models should have rather similar log-likelihoods, but yet we have:
c(logLik(gm1)) c(logLik(gm2))
This seems to be fixable with:
c(logLik(gm2)) - 0.5 * (deviance(gm2) + sum(getME(gm2, "u")^2))
If this holds up with more examples, should we modify logLik.merMod
accordingly?
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.