library(LM2GLMM)
library(spaMM)
library(lme4)
spaMM.options(nb_cores = 4L)
options(width = 120)
knitr::opts_chunk$set(cache = TRUE, cache.path = "./cache_knitr/MM_solving_pb/", fig.path = "./fig_knitr/MM_solving_pb/", fig.width = 5, fig.height = 5, fig.align = "center", error = FALSE)

```{css csssmall, echo=FALSE, cache=FALSE} .prettyprint.lang-small { background-color: transparent; line-height: 13px; font-size: 12px; }

## Mixed-effects models

* 4.0 [Introduction to LMM & GLMM](./MM_intro_course.html)
* 4.1 [Solving LM problems using LMM](./MM_solving_pb_course.html)
* 4.2 [A showcase of some useful applications](./MM_showcase_course.html)

<br>

<div align="right">
[Back to main menu](./Title.html#2)
</div>


## You will learn in this session `r .emo("goal")`

* how to handle temporal and spatial autocorrelation
* how to model heteroskedasticity


# Temporal autocorrelation

## The ```AirPassengers``` data `r .emo("alien")`

```r
AirPassengers

The AirPassengers data r .emo("alien")

plot(AirPassengers)

Reformating the dataset for the fit r .emo("alien")

(air <- data.frame(passengers = as.numeric(AirPassengers),
                   year = rep(1949:1960, each = 12),
                   month = factor(rep(1:12, 12))))

Looking at the average trend per year r .emo("info")

plot(with(air, tapply(passengers, year, mean)) ~ I(1949:1960),
     ylab = "Mean number of passengers", xlab = "Year", type = "b")

Looking at the average trend per month r .emo("info")

plot(with(air, tapply(passengers, month, mean)) ~ I(1:12),
     ylab = "Mean number of passengers", xlab = "Month", type = "h")

Simple LM fit r .emo("info")

(fit_air_LM <- lm(passengers ~ year + month, data = air))

The problem r .emo("info")

par(mfrow = c(2, 2))
plot(fit_air_LM)

The problem r .emo("info")

plot(residuals(fit_air_LM), type = "b")
abline(h = 0, lty = 2, col = "red")

The problem r .emo("info")

lmtest::dwtest(fit_air_LM)


There is strong serial autocorrelation in these data!

The problem r .emo("info")

acf(residuals(fit_air_LM))

Autoregressive fit using {spaMM} r .emo("practice")

air$time <- 1:nrow(air)
library(spaMM)
(fit_air_spaMM <- fitme(passengers ~ month + year + AR1(1|time), data = air))

Autoregressive fit using {glmmTMB} r .emo("practice")

library(glmmTMB)
air$group <- factor("A")
air$Year <- air$year - 1949 ## remove 1949 to help with convergence
air$Time <- numFactor(air$time)
(fit_air_glmmTMB <- glmmTMB(passengers ~ month + Year + ar1(Time + 0|group), data = air))

Autocorrelation estimate r .emo("practice")

You can extract the autocorrelation estimate:

spaMM::get_ranPars(fit_air_spaMM, which = "corrPars")[[1]][[1]]
get_cor(getME(fit_air_glmmTMB, "theta")[2])

Assumptions r .emo("info")

Unfortunately, in the presence of correlation function(s) in the models, the approach followed by {DHARMa} is no longer valid and there is no easy workaround I am aware of...

library(DHARMa)
plot(simulateResiduals(fit_air_spaMM))

Assumptions r .emo("info")

Unfortunately, in the presence of correlation function(s) in the models, the approach followed by {DHARMa} is no longer valid and there is no easy workaround I am aware of...

... but you can still check which model fits the data the best:

AIC(fit_air_LM)
AIC(fit_air_spaMM)
AIC(fit_air_spaMM, verbose = TRUE)

Note:

Fitted values r .emo("info")

pred_spaMM <- predict(fit_air_spaMM)
plot(passengers ~ time, data = air, type = "l", lwd = 3, ylim = c(0, 700), ylab = "Passengers")
points(pred_spaMM ~ air$time, type = "l", col = "green")

Note: never extrapolate using such model! Obtaining a perfect fit is not unusual in the presence of autocorrelation.

Testing the effect of years r .emo("practice")

fit_air_spaMM_no_year <- fitme(passengers ~ month + AR1(1|time), data = air)
anova(fit_air_spaMM, fit_air_spaMM_no_year)
fit_air_TMB_no_year <- glmmTMB(passengers ~ month + ar1(Time + 0|group), data = air)
anova(fit_air_glmmTMB, fit_air_TMB_no_year)

More complex autocorrelation functions r .emo("info")

There are more complex autocorrelation functions out there.

The package {nlme} used to be the only option out there, but this package is problematic in many ways.

Instead, I would now dig further into {glmmTMB} and {spaMM} which allows for already quite a lot of possibilities and more autocorrelation will probably be added as these packages develop.

Spatial autocorrelation

Maximum normalised-difference vegetation index in north Cameroon r .emo("alien")

data("Loaloa")
ndvi <- Loaloa[, c("maxNDVI", "latitude", "longitude")]
head(ndvi)

Visualising the data r .emo("practice")

library(maps)
spaMMplot2D(x = ndvi$longitude, y = ndvi$latitude, z = ndvi$maxNDVI, add.map = TRUE,
            xlab = "Longitude", ylab = "Latitude", plot.title = title(main = "max NDVI"))

Visualising the data r .emo("practice")

pairs(ndvi)

Fitting the model r .emo("practice")

(fit_ndvi1 <- fitme(maxNDVI ~ 1 + Matern(1|longitude + latitude), data = ndvi, method = "REML",
                    control.dist = list(dist.method = "Earth"))) ## optional but more accurate than Euclidean

Plotting the autocorrelation

Here is how the correlation varies as a function of the distance between points:

distance <- seq(0, 500, length.out = 100)
matern_cor <- MaternCorr(distance, rho = fit_ndvi1$corrPars[[1]]$rho, nu  = fit_ndvi1$corrPars[[1]]$nu)
plot(matern_cor ~ distance, type = "l")

Note: if the fit considers instead euclidean distances, then the x-axis would be in degrees, not Km.

Fitted values r .emo("practice")

mapMM(fit_ndvi1, add.map = TRUE, plot.title = title(xlab = "Longitude", ylab = "Latitude"))

Predictions r .emo("practice")

filled.mapMM(fit_ndvi1, add.map = TRUE, plot.title = title(xlab = "Longitude", ylab = "Latitude"))

Prediction uncertainty r .emo("practice")

It is also possible to estimate how the uncertainty of the predictions varies in space:

x.for.pred <- seq(min(ndvi$longitude), max(ndvi$longitude), length.out = 100)
y.for.pred <- seq(min(ndvi$latitude), max(ndvi$latitude), length.out = 50)
data.for.pred <- expand.grid(longitude = x.for.pred, latitude = y.for.pred)
data.for.pred$predVar <- get_predVar(fit_ndvi1, newdata = data.for.pred)
m <- matrix(data.for.pred$predVar, ncol = length(y.for.pred))

Prediction uncertainty r .emo("practice")

spaMM.filled.contour(x = x.for.pred, y = y.for.pred, z = m, plot.axes = {
  points(ndvi[, c("longitude", "latitude")]); axis(1); axis(2)}, col = spaMM.colors(redshift = 3),
  plot.title = title(xlab = "Longitude", ylab = "Latitude"))

Spatial fit in {glmmTMB} r .emo("practice")

You can also use {glmmTMB}, but it is considerable slower than {spaMM} with a Matern correlation function, and it does not seem quite ready yet (status = "experimental/untested").

Here is what seems to be the correct syntax (not yet properly documented):

ndvi$group <- factor("A")
ndvi$coord <- numFactor(ndvi$latitude, ndvi$longitude)
fit_ndvi1_TMB <- glmmTMB(maxNDVI ~ 1 + mat(0 + coord|group), data = ndvi, REML = TRUE)

Note:

Heteroscedasticity

Heteroscedasticity r .emo("info")

We used random effects to model variance components from random variables.

We can thus use the same kind of tools to also model the variance components of the errors.

Do not mix-up random variance components from residual ones: remember that in the error, all covariance terms are null (by definition).

The bodyweight dataset r .emo("alien")

head(bodyweight)
str(bodyweight)

The bodyweight dataset r .emo("alien")

coplot(weight ~ Time | Rat * Diet, data = bodyweight, panel = panel.smooth)

Let's model growth in rats r .emo("practice")

(fit_rat <- fitme(weight ~ Diet * Time + (Time|Rat) + AR1(1|Time), data = bodyweight,
                  method = "REML"))

Let's model growth in rats r .emo("practice")

coplot(residuals(fit_rat) ~ I(1:nrow(bodyweight)) | bodyweight$Diet, show.given = FALSE)

Let's model growth in rats r .emo("practice")

(fit_rat_hetero <- fitme(weight ~ Diet * Time + (Time|Rat) + AR1(1|Time), data = bodyweight,
                         method = "REML", resid.model = ~ Diet))

Let's test the overal effect of the diet r .emo("practice")

fit_rat_hetero <- fitme(weight ~ Diet * Time + (Time|Rat) + AR1(1|Time), data = bodyweight,
                        method = "ML", resid.model = ~ Diet)

fit_rat_hetero0 <- fitme(weight ~ Time + (Time|Rat) + AR1(1|Time), data = bodyweight,
                         method = "ML", resid.model = ~ Diet)

anova(fit_rat_hetero, fit_rat_hetero0) ## parametric bootstrap would be best here, but very slow
                                       ## and it would not affect the conclusion since the effect is very strong

Heteroscedasticity in LM r .emo("practice")

You can also model heteroskedasticity in LM as we just saw it:

set.seed(1L)
d <- data.frame(y = c(rnorm(100, mean = 10, sd = sqrt(10)),
                      rnorm(100, mean = 20, sd = sqrt(20))),
                group = factor(c(rep("A", 100), rep("B", 100))))

fit_hetero_spaMM <- fitme(y ~ group, resid.model = ~ group, data = d, method = "REML")
phi_spaMM <- summary(fit_hetero_spaMM, verbose = FALSE)$phi[, "Estimate"] ## in log scale using contrast treatment
exp(c(phi_spaMM[[1]], sum(phi_spaMM))) ## residual variance in response scale without contrasts

Once again {glmmTMB} does that too:

fit_hetero_glmmTMB <- glmmTMB(y ~ group, dispformula = ~ group, data = d, REML = TRUE)
phi_glmmTMB <- fit_hetero_glmmTMB$fit$par ## in log scale using contrast treatment
exp(c(phi_glmmTMB[[1]], sum(phi_glmmTMB))) ## residual variance in response scale without contrasts

Note: {spaMM} even allows for including random effects influencing the residual variance! r .emo("slow")``r .emo("slow")``r .emo("slow")

What you need to remember r .emo("goal")

Table of contents

Mixed-effects models


[Back to main menu](./Title.html#2)


courtiol/LM2GLMM documentation built on July 3, 2022, 7:42 a.m.