In lme4/lme4: Linear Mixed-Effects Models using 'Eigen' and S4

Deviance and log-likelihood in `lme4`

Up to now/previously, "deviance" has been defined in lme4 as -2*(log likelihood). In the tweaked development version used here, I've redefined it as the sum of the (squared) deviance residuals.

Fit two basic models:

library(lme4)
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
             data = cbpp, family = binomial)
gm2 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
             data = cbpp, family = binomial, nAGQ = 2)

Various summaries:

deviance(gm1)
-2*logLik(gm1)
dd <- update(gm1,devFunOnly=TRUE)
dd(unlist(getME(gm1,c("theta","beta"))))

The "deviance function" actually returns $-2L$, not the deviance.

Create a new version with $\theta$ set to zero (to check match with glm() likelihood, deviance, etc.):

## wrap deviance function: force $\theta=0$
dd2 <- function(p) {
    dd(c(0,p))
}
(dev0 <- (opt0 <- optim(fn=dd2,par=getME(gm1,"beta")))$val)
mm2 <- mm <- getCall(gm1)
mm[[1]] <- quote(glFormula)
ff <- eval(mm)  ## auxiliary information
opt0$par <- c(0,opt0$par)
## have to convert names of results to make lme4 happy
names(opt0)[match(c("convergence","value"),names(opt0))] <- c("conv","fval")
gmb <- with(ff,mkMerMod(environment(dd),opt0,reTrms,fr,mm2))

And with glm:

gm0<- glm(cbind(incidence, size - incidence) ~ period,
             data = cbpp, family = binomial)
all.equal(deviance(gm0),deviance(gmb))  ## deviances match
(d0 <- c(-2*logLik(gm0)))
all.equal(dev0,d0)   ## 'deviance function' matches -2L
## deviance residuals match
all.equal(residuals(gmb),residuals(gm0),tol=1e-3)

I'm sure this has been gone over a million times before, but let's review the relationships between the deviance and the log-likelihood as defined in base R (i.e. glm):

within glm.fit, the deviance is defined as the sum of the deviance residuals (i.e. sum(dev.resids(y, mu, weights))), which in turn are defined in binomial() (or the other results from family()) as the squared deviance residuals ... it is then stored in the $deviance element of the list.

## access dev. resids built into binomial():
devres2 <- with(cbpp,
     binomial()$dev.resid(y=incidence/size,mu=predict(gm0,type="response"),
                          wt=size))
all.equal(devres2,unname(residuals(gm0,"deviance")^2))

(binomial()$dev.resid() calls the internal C_binomial_dev_resids function, which computes $2 w_i (y \log(y/\mu) + (1-y) \log((1-y)/(1-\mu)))$ \ldots) ... this is the same as the binomialDist::devResid defined in glmFamily.cpp ...

In logLik.glm, the log-likelihood is retrieved (weirdly, as has been pointed out before) from the $aic component of the fitted object (by computing ${\cal L}=p-\mbox{AIC}/2$), which is in turn computed directly, e.g. for binomial it is

    -2 * sum(ifelse(m > 0, (wt/m), 0) * dbinom(round(m * y),
        round(m), mu, log = TRUE))

(that is, the $aic function really computes -2*the log likelihood -- AIC.default uses -2 * val$ll + k * val$df) However, it gets worse for families with a scale parameter, for which the $aic component returns $-2{\cal L} + 2$. The additional $2$ presumably accounts for the scale parameter, e.g. for the gaussian family we have

n <- 10
yy <- rnorm(n)
mu <- rnorm(n)
ss <- sum((yy-mu)^2)
all.equal(gaussian()$aic(yy, 1, mu, 1, ss),
          -2 * sum(dnorm(yy, mu, sqrt(ss/n), log = TRUE)) + 2)

Note also the added two at the end of the definition of Gamma()$aic, which is

-2 * sum(dgamma(y, 1/disp, scale = mu * disp, log = TRUE) * wt) + 2

aic2 <- with(cbpp,
     binomial()$aic(y=incidence/size,n=1,mu=predict(gm0,type="response"),
                          wt=size))
all.equal(aic2,c(-2*logLik(gm0)))

But glm.fit takes the results of binomial()$aic() (which is $-2L$) and converts it to

aic.model <- aic(y, n, mu, weights, dev) + 2 * rank

In summary, in base R:

family()$aic computes $-2L$, which glm.fit translates to an AIC by adding $2k$ and storing it in model$aic
logLik.default retrieves model$aic and converts it back to a log-likelihood
stats:::AIC.default retrieves the log-likelihood and converts it back to an AIC (!)
family()$dev.resid() computes the squared deviance residuals
stats:::residuals.glm retrieves these values and takes the signed square root

In lme4:

logLik computes the log-likelihood from scratch based on the weighted penalized residual sum of squares;
residuals.merMod (which calls residuals.lmResp or residuals.glmResp) calls the $devResid() method, which calls the interval glm_devResid function to return the squared deviance residuals, and finds the signed square root.
The only real weirdness is that the objective function, which we call the "deviance function", returns $-2L$ rather than the deviance.

Bottom line: for GLMMs, I have changed the deviance() method to return the sum of squares of the deviance residuals, rather than $-2L$. Now let's see what breaks ...

The definition of deviance and log-likelihood

As discussed above, even in GLMs there are questions about how to define deviance. For example is it just minus twice the log-likelihood or does it involve subtracting the deviance for the saturated model? With GLMMs there is the additional distinction between marginal and conditional deviance.

We represent the conditional log-likelihood as $\log p_{\theta, \beta}(y | u)$ and the unconditional as $\log p_{\theta, \beta}(y)$. The exact relation between the two is $p_{\theta, \beta}(y) = \int p_{\theta, \beta}(y | u) p(u)$, where $p(u)$ is an independent multivariate normal. For canonical links, the Laplace approximation to the marginal log-likelihood is $\log (p_{\theta, \beta}(y | u)) - \frac{1}{2}\|u\|^2 - |L_{\theta}|$ (nb that this is technically an approximate Laplace approximation in the non-canonical link case). Here are several ways to compute this approximate marginal log-likelihood:

logLik(gm1) # which calls ...
-0.5 * lme4:::devCrit(gm1)    # which calls ...
-0.5 * gm1@devcomp$cmp["dev"] # which is computed from something like ...
-0.5 * dd(gm1@optinfo$val)    # which is computed from something like ...
-0.5 * (gm1@resp$aic() + gm1@pp$sqrL(1) + gm1@pp$ldL2()) # which is wrapped into ...
lme4ord:::laplace(gm1) # which by the family$aic property discussed above is also given by ...
sum(with(cbpp, dbinom(incidence, size, fitted(gm1), log = TRUE))) - 0.5 * (gm1@pp$sqrL(1) + gm1@pp$ldL2())

The explanation of these results is that -0.5 * object@resp$aic() gives the conditional log-likelihood, $\log p_{\theta, \beta}(y | u)$, which is why this term can also be obtained with dbinom in the above example. The Laplace approximation correction terms for converting a conditional log-likelihood into a marginal log-likelihood are gm1@pp$sqrL(1) and gm1@pp$ldL2()).

For reasons discussed above, it may be confusing that another log-likelihood synonym is not,

-0.5 * (deviance(gm1) + gm1@pp$sqrL(1) + gm1@pp$ldL2())

The reason why is that deviance returns the conditional model deviance minus the conditional saturated deviance.

In summary, on the deviance scale, we have:

| relative to saturated | conditional | marginal | | --------------------- | ----------- | -------- | | yes | deviance(object) | ???? | | no | object@resp$aic() | -2 * logLik(object) |

Considering this table helps to identify several inconsistencies with terminology. For example, the deviance reported in the print.merMod method is actually reporting -2 * logLik(object) as opposed to deviance(object):

lme4:::getLlikAIC(gm1)$AICtab["deviance"]
deviance(gm1)

On one hand it makes sense to give a marginal deviance in the print method because marginal AIC is used there too:

lme4:::getLlikAIC(gm1)$AICtab["AIC"]
AIC(gm1)
lme4:::getLlikAIC(gm1)$AICtab["deviance"] + 2 * length(gm1@optinfo$val)

On the other hand, as discussed above, deviance usually is defined relative to the saturated model. So arguably what we need in the print method is a marginal version of the deviance relative to the saturated model (see the ????'s in the table above).

Inconsistencies between Laplace and Gauss-Hermite (#161)

Recall that gm1 and gm2 were fitted with Laplace approximation and nAGQ = 2 respectively. These two models should have rather similar log-likelihoods, but yet we have:

c(logLik(gm1))
c(logLik(gm2))

This seems to be fixable with:

c(logLik(gm2)) - 0.5 * (deviance(gm2) + sum(getME(gm2, "u")^2))

If this holds up with more examples, should we modify logLik.merMod accordingly?

lme4/lme4 documentation built on June 14, 2025, 8:49 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lme4/lme4
Linear Mixed-Effects Models using 'Eigen' and S4

In lme4/lme4: Linear Mixed-Effects Models using 'Eigen' and S4

Deviance and log-likelihood in `lme4`

The definition of deviance and log-likelihood

Inconsistencies between Laplace and Gauss-Hermite (#161)

R Package Documentation

Browse R Packages

We want your feedback!

lme4/lme4 Linear Mixed-Effects Models using 'Eigen' and S4

In lme4/lme4: Linear Mixed-Effects Models using 'Eigen' and S4

Deviance and log-likelihood in lme4

The definition of deviance and log-likelihood

Inconsistencies between Laplace and Gauss-Hermite (#161)

R Package Documentation

Browse R Packages

We want your feedback!

lme4/lme4
Linear Mixed-Effects Models using 'Eigen' and S4

Deviance and log-likelihood in `lme4`