In courtiol/LM2GLMM: Data science for biologists: Generalized linear modelling with R

library(LM2GLMM)
library(spaMM)
library(lme4)
spaMM.options(nb_cores = 4L)
options(width = 120)
knitr::opts_chunk$set(cache = TRUE, cache.path = "./cache_knitr/MM_solving_pb/", fig.path = "./fig_knitr/MM_solving_pb/", fig.width = 5, fig.height = 5, fig.align = "center", error = FALSE)

```{css csssmall, echo=FALSE, cache=FALSE} .prettyprint.lang-small { background-color: transparent; line-height: 13px; font-size: 12px; }

## Mixed-effects models

* 4.0 [Introduction to LMM & GLMM](./MM_intro_course.html)
* 4.1 [Solving LM problems using LMM](./MM_solving_pb_course.html)
* 4.2 [A showcase of some useful applications](./MM_showcase_course.html)

<br>

<div align="right">
[Back to main menu](./Title.html#2)
</div>


## You will learn in this session `r .emo("goal")`

* how to handle temporal and spatial autocorrelation
* how to model heteroskedasticity


# Temporal autocorrelation

## The ```AirPassengers``` data `r .emo("alien")`

```r
AirPassengers

The `AirPassengers` data `r .emo("alien")`

plot(AirPassengers)

Reformating the dataset for the fit `r .emo("alien")`

(air <- data.frame(passengers = as.numeric(AirPassengers),
                   year = rep(1949:1960, each = 12),
                   month = factor(rep(1:12, 12))))

Looking at the average trend per year `r .emo("info")`

plot(with(air, tapply(passengers, year, mean)) ~ I(1949:1960),
     ylab = "Mean number of passengers", xlab = "Year", type = "b")

Looking at the average trend per month `r .emo("info")`

plot(with(air, tapply(passengers, month, mean)) ~ I(1:12),
     ylab = "Mean number of passengers", xlab = "Month", type = "h")

Simple LM fit `r .emo("info")`

(fit_air_LM <- lm(passengers ~ year + month, data = air))

The problem `r .emo("info")`

par(mfrow = c(2, 2))
plot(fit_air_LM)

The problem `r .emo("info")`

plot(residuals(fit_air_LM), type = "b")
abline(h = 0, lty = 2, col = "red")

The problem `r .emo("info")`

lmtest::dwtest(fit_air_LM)

There is strong serial autocorrelation in these data!

The problem `r .emo("info")`

acf(residuals(fit_air_LM))

Autoregressive fit using {spaMM} `r .emo("practice")`

air$time <- 1:nrow(air)
library(spaMM)
(fit_air_spaMM <- fitme(passengers ~ month + year + AR1(1|time), data = air))

Autoregressive fit using {glmmTMB} `r .emo("practice")`

library(glmmTMB)
air$group <- factor("A")
air$Year <- air$year - 1949 ## remove 1949 to help with convergence
air$Time <- numFactor(air$time)
(fit_air_glmmTMB <- glmmTMB(passengers ~ month + Year + ar1(Time + 0|group), data = air))

Autocorrelation estimate `r .emo("practice")`

You can extract the autocorrelation estimate:

spaMM::get_ranPars(fit_air_spaMM, which = "corrPars")[[1]][[1]]
get_cor(getME(fit_air_glmmTMB, "theta")[2])

Assumptions `r .emo("info")`

Unfortunately, in the presence of correlation function(s) in the models, the approach followed by {DHARMa} is no longer valid and there is no easy workaround I am aware of...

library(DHARMa)
plot(simulateResiduals(fit_air_spaMM))

Assumptions `r .emo("info")`

Unfortunately, in the presence of correlation function(s) in the models, the approach followed by {DHARMa} is no longer valid and there is no easy workaround I am aware of...

... but you can still check which model fits the data the best:

AIC(fit_air_LM)

AIC(fit_air_spaMM)

AIC(fit_air_spaMM, verbose = TRUE)

Note:

you need to compare here the AIC from the LM to the marginal AIC from the LMM.

Fitted values `r .emo("info")`

pred_spaMM <- predict(fit_air_spaMM)
plot(passengers ~ time, data = air, type = "l", lwd = 3, ylim = c(0, 700), ylab = "Passengers")
points(pred_spaMM ~ air$time, type = "l", col = "green")

Note: never extrapolate using such model! Obtaining a perfect fit is not unusual in the presence of autocorrelation.

Testing the effect of years `r .emo("practice")`

fit_air_spaMM_no_year <- fitme(passengers ~ month + AR1(1|time), data = air)
anova(fit_air_spaMM, fit_air_spaMM_no_year)

fit_air_TMB_no_year <- glmmTMB(passengers ~ month + ar1(Time + 0|group), data = air)
anova(fit_air_glmmTMB, fit_air_TMB_no_year)

More complex autocorrelation functions `r .emo("info")`

There are more complex autocorrelation functions out there.

The package {nlme} used to be the only option out there, but this package is problematic in many ways.

Instead, I would now dig further into {glmmTMB} and {spaMM} which allows for already quite a lot of possibilities and more autocorrelation will probably be added as these packages develop.

Spatial autocorrelation

Maximum normalised-difference vegetation index in north Cameroon `r .emo("alien")`

data("Loaloa")
ndvi <- Loaloa[, c("maxNDVI", "latitude", "longitude")]
head(ndvi)

Visualising the data `r .emo("practice")`

library(maps)
spaMMplot2D(x = ndvi$longitude, y = ndvi$latitude, z = ndvi$maxNDVI, add.map = TRUE,
            xlab = "Longitude", ylab = "Latitude", plot.title = title(main = "max NDVI"))

Visualising the data `r .emo("practice")`

pairs(ndvi)

Fitting the model `r .emo("practice")`

(fit_ndvi1 <- fitme(maxNDVI ~ 1 + Matern(1|longitude + latitude), data = ndvi, method = "REML",
                    control.dist = list(dist.method = "Earth"))) ## optional but more accurate than Euclidean

Plotting the autocorrelation

Here is how the correlation varies as a function of the distance between points:

distance <- seq(0, 500, length.out = 100)
matern_cor <- MaternCorr(distance, rho = fit_ndvi1$corrPars[[1]]$rho, nu  = fit_ndvi1$corrPars[[1]]$nu)
plot(matern_cor ~ distance, type = "l")

Note: if the fit considers instead euclidean distances, then the x-axis would be in degrees, not Km.

Fitted values `r .emo("practice")`

mapMM(fit_ndvi1, add.map = TRUE, plot.title = title(xlab = "Longitude", ylab = "Latitude"))

Predictions `r .emo("practice")`

filled.mapMM(fit_ndvi1, add.map = TRUE, plot.title = title(xlab = "Longitude", ylab = "Latitude"))

Prediction uncertainty `r .emo("practice")`

It is also possible to estimate how the uncertainty of the predictions varies in space:

x.for.pred <- seq(min(ndvi$longitude), max(ndvi$longitude), length.out = 100)
y.for.pred <- seq(min(ndvi$latitude), max(ndvi$latitude), length.out = 50)
data.for.pred <- expand.grid(longitude = x.for.pred, latitude = y.for.pred)
data.for.pred$predVar <- get_predVar(fit_ndvi1, newdata = data.for.pred)
m <- matrix(data.for.pred$predVar, ncol = length(y.for.pred))

Prediction uncertainty `r .emo("practice")`

spaMM.filled.contour(x = x.for.pred, y = y.for.pred, z = m, plot.axes = {
  points(ndvi[, c("longitude", "latitude")]); axis(1); axis(2)}, col = spaMM.colors(redshift = 3),
  plot.title = title(xlab = "Longitude", ylab = "Latitude"))

Spatial fit in {glmmTMB} `r .emo("practice")`

You can also use {glmmTMB}, but it is considerable slower than {spaMM} with a Matern correlation function, and it does not seem quite ready yet (status = "experimental/untested").

Here is what seems to be the correct syntax (not yet properly documented):

ndvi$group <- factor("A")
ndvi$coord <- numFactor(ndvi$latitude, ndvi$longitude)
fit_ndvi1_TMB <- glmmTMB(maxNDVI ~ 1 + mat(0 + coord|group), data = ndvi, REML = TRUE)

Note:

see vignette("covstruct", package = "glmmTMB") for more details.

Heteroscedasticity

Heteroscedasticity `r .emo("info")`

We used random effects to model variance components from random variables.

We can thus use the same kind of tools to also model the variance components of the errors.

Do not mix-up random variance components from residual ones: remember that in the error, all covariance terms are null (by definition).

The `bodyweight` dataset `r .emo("alien")`

head(bodyweight)
str(bodyweight)

The `bodyweight` dataset `r .emo("alien")`

coplot(weight ~ Time | Rat * Diet, data = bodyweight, panel = panel.smooth)

Let's model growth in rats `r .emo("practice")`

(fit_rat <- fitme(weight ~ Diet * Time + (Time|Rat) + AR1(1|Time), data = bodyweight,
                  method = "REML"))

Let's model growth in rats `r .emo("practice")`

coplot(residuals(fit_rat) ~ I(1:nrow(bodyweight)) | bodyweight$Diet, show.given = FALSE)

Let's model growth in rats `r .emo("practice")`

(fit_rat_hetero <- fitme(weight ~ Diet * Time + (Time|Rat) + AR1(1|Time), data = bodyweight,
                         method = "REML", resid.model = ~ Diet))

Let's test the overal effect of the diet `r .emo("practice")`

fit_rat_hetero <- fitme(weight ~ Diet * Time + (Time|Rat) + AR1(1|Time), data = bodyweight,
                        method = "ML", resid.model = ~ Diet)

fit_rat_hetero0 <- fitme(weight ~ Time + (Time|Rat) + AR1(1|Time), data = bodyweight,
                         method = "ML", resid.model = ~ Diet)

anova(fit_rat_hetero, fit_rat_hetero0) ## parametric bootstrap would be best here, but very slow
                                       ## and it would not affect the conclusion since the effect is very strong

Heteroscedasticity in LM `r .emo("practice")`

You can also model heteroskedasticity in LM as we just saw it:

set.seed(1L)
d <- data.frame(y = c(rnorm(100, mean = 10, sd = sqrt(10)),
                      rnorm(100, mean = 20, sd = sqrt(20))),
                group = factor(c(rep("A", 100), rep("B", 100))))

fit_hetero_spaMM <- fitme(y ~ group, resid.model = ~ group, data = d, method = "REML")
phi_spaMM <- summary(fit_hetero_spaMM, verbose = FALSE)$phi[, "Estimate"] ## in log scale using contrast treatment
exp(c(phi_spaMM[[1]], sum(phi_spaMM))) ## residual variance in response scale without contrasts

Once again {glmmTMB} does that too:

fit_hetero_glmmTMB <- glmmTMB(y ~ group, dispformula = ~ group, data = d, REML = TRUE)
phi_glmmTMB <- fit_hetero_glmmTMB$fit$par ## in log scale using contrast treatment
exp(c(phi_glmmTMB[[1]], sum(phi_glmmTMB))) ## residual variance in response scale without contrasts

Note: {spaMM} even allows for including random effects influencing the residual variance! r .emo("slow")``r .emo("slow")``r .emo("slow")

What you need to remember `r .emo("goal")`

how to handle temporal and spatial autocorrelation
how to model heteroskedasticity

Mixed-effects models

[Back to main menu](./Title.html#2)

courtiol/LM2GLMM documentation built on Oct. 10, 2024, 7:22 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

courtiol/LM2GLMM Data science for biologists: Generalized linear modelling with R

In courtiol/LM2GLMM: Data science for biologists: Generalized linear modelling with R

The AirPassengers data r .emo("alien")

Reformating the dataset for the fit r .emo("alien")

Looking at the average trend per year r .emo("info")

Looking at the average trend per month r .emo("info")

Simple LM fit r .emo("info")

The problem r .emo("info")

The problem r .emo("info")

The problem r .emo("info")

The problem r .emo("info")

Autoregressive fit using {spaMM} r .emo("practice")

Autoregressive fit using {glmmTMB} r .emo("practice")

Autocorrelation estimate r .emo("practice")

Assumptions r .emo("info")

Assumptions r .emo("info")

Fitted values r .emo("info")

Testing the effect of years r .emo("practice")

More complex autocorrelation functions r .emo("info")

Spatial autocorrelation

Maximum normalised-difference vegetation index in north Cameroon r .emo("alien")

Visualising the data r .emo("practice")

Visualising the data r .emo("practice")

Fitting the model r .emo("practice")

Plotting the autocorrelation

Fitted values r .emo("practice")

Predictions r .emo("practice")

Prediction uncertainty r .emo("practice")

Prediction uncertainty r .emo("practice")

Spatial fit in {glmmTMB} r .emo("practice")

Heteroscedasticity

Heteroscedasticity r .emo("info")

The bodyweight dataset r .emo("alien")

The bodyweight dataset r .emo("alien")

Let's model growth in rats r .emo("practice")

Let's model growth in rats r .emo("practice")

Let's model growth in rats r .emo("practice")

Let's test the overal effect of the diet r .emo("practice")

Heteroscedasticity in LM r .emo("practice")

What you need to remember r .emo("goal")

Table of contents

Mixed-effects models

R Package Documentation

Browse R Packages

We want your feedback!

courtiol/LM2GLMM
Data science for biologists: Generalized linear modelling with R

The `AirPassengers` data `r .emo("alien")`

Reformating the dataset for the fit `r .emo("alien")`

Looking at the average trend per year `r .emo("info")`

Looking at the average trend per month `r .emo("info")`

Simple LM fit `r .emo("info")`

The problem `r .emo("info")`

The problem `r .emo("info")`

The problem `r .emo("info")`

The problem `r .emo("info")`

Autoregressive fit using {spaMM} `r .emo("practice")`

Autoregressive fit using {glmmTMB} `r .emo("practice")`

Autocorrelation estimate `r .emo("practice")`

Assumptions `r .emo("info")`

Assumptions `r .emo("info")`

Fitted values `r .emo("info")`

Testing the effect of years `r .emo("practice")`

More complex autocorrelation functions `r .emo("info")`

Maximum normalised-difference vegetation index in north Cameroon `r .emo("alien")`

Visualising the data `r .emo("practice")`

Visualising the data `r .emo("practice")`

Fitting the model `r .emo("practice")`

Fitted values `r .emo("practice")`

Predictions `r .emo("practice")`

Prediction uncertainty `r .emo("practice")`

Prediction uncertainty `r .emo("practice")`

Spatial fit in {glmmTMB} `r .emo("practice")`

Heteroscedasticity `r .emo("info")`

The `bodyweight` dataset `r .emo("alien")`

The `bodyweight` dataset `r .emo("alien")`

Let's model growth in rats `r .emo("practice")`

Let's model growth in rats `r .emo("practice")`

Let's model growth in rats `r .emo("practice")`

Let's test the overal effect of the diet `r .emo("practice")`

Heteroscedasticity in LM `r .emo("practice")`

What you need to remember `r .emo("goal")`