library(LM2GLMM) library(spaMM) library(lme4) spaMM.options(nb_cores = 4L) options(width = 120) knitr::opts_chunk$set(cache = TRUE, cache.path = "./cache_knitr/MM_solving_pb/", fig.path = "./fig_knitr/MM_solving_pb/", fig.width = 5, fig.height = 5, fig.align = "center", error = FALSE)
```{css csssmall, echo=FALSE, cache=FALSE} .prettyprint.lang-small { background-color: transparent; line-height: 13px; font-size: 12px; }
## Mixed-effects models * 4.0 [Introduction to LMM & GLMM](./MM_intro_course.html) * 4.1 [Solving LM problems using LMM](./MM_solving_pb_course.html) * 4.2 [A showcase of some useful applications](./MM_showcase_course.html) <br> <div align="right"> [Back to main menu](./Title.html#2) </div> ## You will learn in this session `r .emo("goal")` * how to handle temporal and spatial autocorrelation * how to model heteroskedasticity # Temporal autocorrelation ## The ```AirPassengers``` data `r .emo("alien")` ```r AirPassengers
AirPassengers
data r .emo("alien")
plot(AirPassengers)
r .emo("alien")
(air <- data.frame(passengers = as.numeric(AirPassengers), year = rep(1949:1960, each = 12), month = factor(rep(1:12, 12))))
r .emo("info")
plot(with(air, tapply(passengers, year, mean)) ~ I(1949:1960), ylab = "Mean number of passengers", xlab = "Year", type = "b")
r .emo("info")
plot(with(air, tapply(passengers, month, mean)) ~ I(1:12), ylab = "Mean number of passengers", xlab = "Month", type = "h")
r .emo("info")
(fit_air_LM <- lm(passengers ~ year + month, data = air))
r .emo("info")
par(mfrow = c(2, 2)) plot(fit_air_LM)
r .emo("info")
plot(residuals(fit_air_LM), type = "b") abline(h = 0, lty = 2, col = "red")
r .emo("info")
lmtest::dwtest(fit_air_LM)
There is strong serial autocorrelation in these data!
r .emo("info")
acf(residuals(fit_air_LM))
r .emo("practice")
air$time <- 1:nrow(air) library(spaMM) (fit_air_spaMM <- fitme(passengers ~ month + year + AR1(1|time), data = air))
r .emo("practice")
library(glmmTMB) air$group <- factor("A") air$Year <- air$year - 1949 ## remove 1949 to help with convergence air$Time <- numFactor(air$time) (fit_air_glmmTMB <- glmmTMB(passengers ~ month + Year + ar1(Time + 0|group), data = air))
r .emo("practice")
You can extract the autocorrelation estimate:
spaMM::get_ranPars(fit_air_spaMM, which = "corrPars")[[1]][[1]] get_cor(getME(fit_air_glmmTMB, "theta")[2])
r .emo("info")
Unfortunately, in the presence of correlation function(s) in the models, the approach followed by {DHARMa} is no longer valid and there is no easy workaround I am aware of...
library(DHARMa) plot(simulateResiduals(fit_air_spaMM))
r .emo("info")
Unfortunately, in the presence of correlation function(s) in the models, the approach followed by {DHARMa} is no longer valid and there is no easy workaround I am aware of...
... but you can still check which model fits the data the best:
AIC(fit_air_LM)
AIC(fit_air_spaMM)
AIC(fit_air_spaMM, verbose = TRUE)
Note:
r .emo("info")
pred_spaMM <- predict(fit_air_spaMM) plot(passengers ~ time, data = air, type = "l", lwd = 3, ylim = c(0, 700), ylab = "Passengers") points(pred_spaMM ~ air$time, type = "l", col = "green")
Note: never extrapolate using such model! Obtaining a perfect fit is not unusual in the presence of autocorrelation.
r .emo("practice")
fit_air_spaMM_no_year <- fitme(passengers ~ month + AR1(1|time), data = air) anova(fit_air_spaMM, fit_air_spaMM_no_year)
fit_air_TMB_no_year <- glmmTMB(passengers ~ month + ar1(Time + 0|group), data = air) anova(fit_air_glmmTMB, fit_air_TMB_no_year)
r .emo("info")
There are more complex autocorrelation functions out there.
The package {nlme} used to be the only option out there, but this package is problematic in many ways.
Instead, I would now dig further into {glmmTMB} and {spaMM} which allows for already quite a lot of possibilities and more autocorrelation will probably be added as these packages develop.
r .emo("alien")
data("Loaloa") ndvi <- Loaloa[, c("maxNDVI", "latitude", "longitude")] head(ndvi)
r .emo("practice")
library(maps) spaMMplot2D(x = ndvi$longitude, y = ndvi$latitude, z = ndvi$maxNDVI, add.map = TRUE, xlab = "Longitude", ylab = "Latitude", plot.title = title(main = "max NDVI"))
r .emo("practice")
pairs(ndvi)
r .emo("practice")
(fit_ndvi1 <- fitme(maxNDVI ~ 1 + Matern(1|longitude + latitude), data = ndvi, method = "REML", control.dist = list(dist.method = "Earth"))) ## optional but more accurate than Euclidean
Here is how the correlation varies as a function of the distance between points:
distance <- seq(0, 500, length.out = 100) matern_cor <- MaternCorr(distance, rho = fit_ndvi1$corrPars[[1]]$rho, nu = fit_ndvi1$corrPars[[1]]$nu) plot(matern_cor ~ distance, type = "l")
Note: if the fit considers instead euclidean distances, then the x-axis would be in degrees, not Km.
r .emo("practice")
mapMM(fit_ndvi1, add.map = TRUE, plot.title = title(xlab = "Longitude", ylab = "Latitude"))
r .emo("practice")
filled.mapMM(fit_ndvi1, add.map = TRUE, plot.title = title(xlab = "Longitude", ylab = "Latitude"))
r .emo("practice")
It is also possible to estimate how the uncertainty of the predictions varies in space:
x.for.pred <- seq(min(ndvi$longitude), max(ndvi$longitude), length.out = 100) y.for.pred <- seq(min(ndvi$latitude), max(ndvi$latitude), length.out = 50) data.for.pred <- expand.grid(longitude = x.for.pred, latitude = y.for.pred) data.for.pred$predVar <- get_predVar(fit_ndvi1, newdata = data.for.pred) m <- matrix(data.for.pred$predVar, ncol = length(y.for.pred))
r .emo("practice")
spaMM.filled.contour(x = x.for.pred, y = y.for.pred, z = m, plot.axes = { points(ndvi[, c("longitude", "latitude")]); axis(1); axis(2)}, col = spaMM.colors(redshift = 3), plot.title = title(xlab = "Longitude", ylab = "Latitude"))
r .emo("practice")
You can also use {glmmTMB}, but it is considerable slower than {spaMM} with a Matern correlation function, and it does not seem quite ready yet (status = "experimental/untested").
Here is what seems to be the correct syntax (not yet properly documented):
ndvi$group <- factor("A") ndvi$coord <- numFactor(ndvi$latitude, ndvi$longitude) fit_ndvi1_TMB <- glmmTMB(maxNDVI ~ 1 + mat(0 + coord|group), data = ndvi, REML = TRUE)
Note:
vignette("covstruct", package = "glmmTMB")
for more details.r .emo("info")
We used random effects to model variance components from random variables.
We can thus use the same kind of tools to also model the variance components of the errors.
Do not mix-up random variance components from residual ones: remember that in the error, all covariance terms are null (by definition).
bodyweight
dataset r .emo("alien")
head(bodyweight) str(bodyweight)
bodyweight
dataset r .emo("alien")
coplot(weight ~ Time | Rat * Diet, data = bodyweight, panel = panel.smooth)
r .emo("practice")
(fit_rat <- fitme(weight ~ Diet * Time + (Time|Rat) + AR1(1|Time), data = bodyweight, method = "REML"))
r .emo("practice")
coplot(residuals(fit_rat) ~ I(1:nrow(bodyweight)) | bodyweight$Diet, show.given = FALSE)
r .emo("practice")
(fit_rat_hetero <- fitme(weight ~ Diet * Time + (Time|Rat) + AR1(1|Time), data = bodyweight, method = "REML", resid.model = ~ Diet))
r .emo("practice")
fit_rat_hetero <- fitme(weight ~ Diet * Time + (Time|Rat) + AR1(1|Time), data = bodyweight, method = "ML", resid.model = ~ Diet) fit_rat_hetero0 <- fitme(weight ~ Time + (Time|Rat) + AR1(1|Time), data = bodyweight, method = "ML", resid.model = ~ Diet) anova(fit_rat_hetero, fit_rat_hetero0) ## parametric bootstrap would be best here, but very slow ## and it would not affect the conclusion since the effect is very strong
r .emo("practice")
You can also model heteroskedasticity in LM as we just saw it:
set.seed(1L) d <- data.frame(y = c(rnorm(100, mean = 10, sd = sqrt(10)), rnorm(100, mean = 20, sd = sqrt(20))), group = factor(c(rep("A", 100), rep("B", 100)))) fit_hetero_spaMM <- fitme(y ~ group, resid.model = ~ group, data = d, method = "REML") phi_spaMM <- summary(fit_hetero_spaMM, verbose = FALSE)$phi[, "Estimate"] ## in log scale using contrast treatment exp(c(phi_spaMM[[1]], sum(phi_spaMM))) ## residual variance in response scale without contrasts
Once again {glmmTMB} does that too:
fit_hetero_glmmTMB <- glmmTMB(y ~ group, dispformula = ~ group, data = d, REML = TRUE) phi_glmmTMB <- fit_hetero_glmmTMB$fit$par ## in log scale using contrast treatment exp(c(phi_glmmTMB[[1]], sum(phi_glmmTMB))) ## residual variance in response scale without contrasts
Note: {spaMM} even allows for including random effects influencing the residual variance! r .emo("slow")``r .emo("slow")``r .emo("slow")
r .emo("goal")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.