library(LM2GLMM)
library(doSNOW)
spaMM::spaMM.options(nb_cores = parallel::detectCores() - 1, barstyle = FALSE)
options(width = 120)
knitr::opts_chunk$set(cache = TRUE, cache.path = "./cache_knitr/MM_showcase/", fig.path = "./fig_knitr/MM_showcase/",
                      fig.width = 5, fig.height = 4, fig.align = "center", error = TRUE)

Mixed-effects models


[Back to main menu](./Title.html#2)

You will learn in this session r .emo("goal")

Studying genetic variation using LMM

The animal model r .emo("info")

The goal of the animal model is to study the unobservable genetic potential of individuals. It can be used, for instance, to know which bull to breed in order to yield daughters with maximal milk production.

Nothing but a simple LMM:

$$ y_i = \mu + a_i + e_i$$

with:

The simplest animal model considers the genetic covariance between two individuals as $\text{A}\times V_\text{A}$:

The fit yields an estimation of the breeding values ($a_i$s) and their variance ($V_\text{A}$).

The relatedness matrix $V_\text{A}$ r .emo("info")

relatedness

The Gryphon dataset r .emo("alien")

Gryphon

Gryphon

Building the matrix $\text{A}$ r .emo("practice")

tail(Gryphon$pedigree)
library(nadiv)
A <- as(makeA(Gryphon$pedigree), "matrix")
colnames(A) <- rownames(A) <- Gryphon$pedigree$ID
A[1305:1309, 1296:1309]

Fitting a simple animal model r .emo("practice")

Let us fit an animal model of the birth weight of gryphons:

library(spaMM)
(fit_bwt1 <- fitme(BWT ~ 1 + corrMatrix(1|ID), corrMatrix = A, data = Gryphon$data, method = "REML"))

Using the animal model to study variances r .emo("practice")

Here is the estimate for the variance of the breeding values (i.e. the additive genetic variance):

(Va <- get_ranPars(fit_bwt1 , which = "lambda")[[1]])

Here is the estimate of the remaining phenotypic variance:

(Ve <- unique(get_residVar(fit_bwt1)))

So we can compute the so called heritability:

(h2 <- Va / (Va + Ve))

Interpretation: r round(100*h2,1) % of the variation in birth weight is due to inheritance!

Fitting a more complex animal model r .emo("practice")

More complex animal models attempt to decompose variance into more than 2 components.

Here for instance, we can add fixed effects and distinguish the contribution of years and of so-called maternal effects:

Gryphon$data$sex    <- factor(Gryphon$data$SEX)
Gryphon$data$year   <- factor(Gryphon$data$BYEAR)
Gryphon$data$mother <- factor(Gryphon$data$MOTHER)

fit_bwt2 <- fitme(BWT ~ sex + (1|year) + (1|mother) + corrMatrix(1|ID), corrMatrix = A,
                  data = Gryphon$data, method = "REML")

Va <- get_ranPars(fit_bwt2 , which = "lambda")[["ID.(Intercept)"]] ## additive genetic variance
Vm <- get_ranPars(fit_bwt2 , which = "lambda")[["mother.(Intercept)"]] ## variance of maternal effects
Vp <- sum(get_ranPars(fit_bwt2 , which = "lambda")) + unique(get_residVar(fit_bwt2)) ## variance not accounted 
                                                                                     ## for by fixed effects

round(c("heritability (without maternal effect)" = Va/Vp, "maternal effect" = Vm/Vp), 3)

Predicting the breeding values ($a_i$) r .emo("practice")

We can estimate the breeding values using BLUPs (Best Linear Unbiased Predictions):

par(mar = c(4, 4, 0, 0))
curve(dnorm(x, sd = sqrt(Va)), from = -3, to = 3, las = 1, ylab = "pdf", xlab = "predicted breeding value")
BLUPs <- ranef(fit_bwt2)$"corrMatrix(1 | ID)"
points(dnorm(BLUPs, sd = sqrt(Va)) ~ BLUPs, col = "blue", type = "h", lwd = 0.1)

Breeding vs phenotypic values r .emo("info")

par(mar = c(4, 4, 0, 0))
plot(BLUPs[paste(fit_bwt2$data$ID)] ~ scale(fit_bwt2$data$BWT), cex = 0.5,
     ylab = "Predicted breeding values", xlab = "Scaled phenotypic values")

Phylogenetic regressions

The ade4::carni70 dataset r .emo("alien")

We will use these data to study how the geographic range size (km) of a species is influenced by its body size (kg) while accounting for the genetic relatedness between species.

data("carni70", package = "ade4")
carni70$tab

The phylogeny r .emo("practice")

To accounting for the genetic relatedness between species, we start by reading the phylogenetic tree:

par(mar = c(1, 0, 0, 0))
tree <- ape::read.tree(text = carni70$tre)
plot(tree, cex = 0.3)

Turning a phylogeny into a correlation matrix r .emo("practice")

In order to use the phylogeny as an input of a MM, we then need to turn it into a correlation matrix:

(corrM <- ape::vcv(tree, model = "Brownian", corr = TRUE)) ## other models are possible

Fitting the phylogenetic regression model r .emo("practice")

We can now fit our phylogenetic regression model similarly to an animal model (but this time using ML to study fixed effects):

carni70$tab$sp <- factor(rownames(corrM))
(fit_carni <- fitme(range ~ size + corrMatrix(1|sp), corrMatrix = corrM, data = carni70$tab))

Testing the fixed effects r .emo("practice")

We can now test the effect of the body size on the geographic range of the species accounting for the phylogeny as usual:

fit_carni_no_size <- fitme(range ~ 1 + corrMatrix(1|sp), corrMatrix = corrM, data = carni70$tab)
anova(fit_carni, fit_carni_no_size, boot.repl = 999,
      fit_env = list(corrM = corrM)) ## fit_env is used for parallel processing: it declares objects not in data

Note:

Comparison with the traditional {gls} fit r .emo("practice")

The package {nlme} is more flexible in this regard, but it presents its own drawbacks.

library(nlme)
rownames(carni70$tab) <- rownames(corrM)
(fit_carni2 <- gls(range ~ size, correlation = ape::corBrownian(1, phy = tree, form = ~ sp),
                   method = "ML", data = carni70$tab))

A key benefit of using a proper (G)LMM (as in {spaMM}) instead of a Generalized Least Squares fit (as in {nlme}) is that we could also consider other random effects and a non Gaussian response (GLMM).

Comparison with the traditional {gls} fit r .emo("practice")

fit_carni2_no_size <- gls(range ~ 1, correlation = ape::corBrownian(1, tree, form = ~ sp),
                          method = "ML", data = carni70$tab)
anova(fit_carni2, fit_carni2_no_size)


Notes:

Meta-analyses

The metafor::dat.bcg dataset r .emo("alien")

We will perform a meta-analysis of 13 studies on the effectiveness of the BCG vaccine against tuberculosis:

library(metafor)
dat.bcg$study <- factor(dat.bcg$trial)
dat.bcg$alloc <- factor(dat.bcg$alloc)
dat.bcg

Note: {metafor} is a very good package dedicated to perform meta-analyses.

Computing effect size(s) r .emo("info")

To compare studies, we need a single metric (and the uncertainty around it).

Here, we compute the relative risk and its associated sampling variance:

dat.bcg$RR  <- with(dat.bcg, log((tpos/(tpos + tneg)) / (cpos/(cpos + cneg))))
dat.bcg$sampling.var <- with(dat.bcg, 1/tpos - 1/tneg + 1/cpos - 1/cneg)
dat.bcg ## t = treatment = vaccinated / c = control = non-vaccinated

Fitting the meta-analysis with {metafor} r .emo("practice")

(fit_tub_meta <- rma(yi = RR ~ factor(alloc) + year + ablat, vi = sampling.var, data = dat.bcg, method = "REML"))

Fitting the meta-analysis with {spaMM} r .emo("practice")

(fit_tub_spaMM <- fitme(RR ~ alloc + year + ablat + (1|study), data = dat.bcg,
                        fixed = list(phi = dat.bcg$sampling.var), method = "REML")) ## phi must be fixed!!

Comparing outputs r .emo("practice")

Estimated amount of residual heterogeneity between studies:

lambda <- get_ranPars(fit_tub_spaMM, which = "lambda")[[1]]
round(c(tau2_metafor = fit_tub_meta$tau2, lambda_spaMM = lambda), 4)

Amount of heterogeneity accounted for by fixed effects:

fit_tub_spaMM_null <- fitme(RR ~ 1 + (1|study), data = dat.bcg,
                            fixed = list(phi = dat.bcg$sampling.var), method = "REML")
lambda0 <- get_ranPars(fit_tub_spaMM_null, which = "lambda")[[1]]
R2_spaMM <- 100 * (lambda0 - lambda) / lambda0
round(c(R2_metafor = fit_tub_meta$R2, R2_spaMM = R2_spaMM), 4)

Conclusion: if you know a general and flexible package like {spaMM}, with a little tinkering of the outputs, you should be able to reproduce what many specialised packages do.

Comparing outputs r .emo("nerd")

We can retrieve the usual metrics used in such studies:

get_meta_metrics <- function(fit, vi) {
  wi <- 1/vi  ## weights = inverse of sampling variance
  k <- length(ranef(fit)[[1]])  ## number of groups in random term (here number of studies)
  p <- length(fixef(fit))  ## number of fixed effect parameters
  W <- diag(wi, nrow = k, ncol = k)  ## matrix of weights
  X <- model.matrix(fit)  ## design matrix
  stXWX <- solve(t(X) %*% W %*% X)  ## weighted vcov of estimates
  P <- W - W %*% X %*% stXWX %*% crossprod(X, W)  ## weighted vcov of ??
  vi.avg <- (k - p) / sum(diag(P)) # weighted mean within study sampling variance
  lambda <- get_ranPars(fit, which = "lambda")[[1]]
  I2 <- as.numeric(100 * lambda / (vi.avg + lambda)) # residual heterogeneity / unaccounted variability
  H2 <- as.numeric((vi.avg + lambda) / vi.avg) # unaccounted variability / sampling variability
  QE <- max(0, c(crossprod(response(fit), P) %*% response(fit))) # residual heterogeneity
  return(c(I2 = I2, H2 = H2, QE = QE)) ## see ?print.rma.uni for details on the metrics
}
rbind(metafor = c(fit_tub_meta$I2, fit_tub_meta$H2, fit_tub_meta$QE),
      spaMM = get_meta_metrics(fit_tub_spaMM, dat.bcg$sampling.var))

Easy & useful plots from {metafor} r .emo("practice")

plot(fit_tub_meta)

Testing fixed effects with {spaMM} r .emo("practice")

To summarise the global effect of vaccination, it would be helpful if the fixed effect where not significant... let us check:

fit_tub_spaMM_ML    <- fitme(RR ~ alloc + year + ablat + (1|study), data = dat.bcg,
                             fixed = list(phi = dat.bcg$sampling.var)) ## ML fit!
fit_tub_spaMM_ML_H0 <- fitme(RR ~ 1 + (1|study), data = dat.bcg, fixed = list(phi = dat.bcg$sampling.var))
anova(fit_tub_spaMM_ML, fit_tub_spaMM_ML_H0, boot.repl = 999, fit_env = list(dat.bcg = dat.bcg))

Note: we are far from asymptotic conditions due to the low number of studies, so parametric bootstrap gives very different results but is clearly more reliable in these conditions.

Testing fixed effects with {metafor} r .emo("practice")

The default test is a Wald test and thus performs poorly especially when the number of studies is low...

(fit_tub_meta_ML <- rma(yi = RR ~ alloc + year + ablat, vi = sampling.var, data = dat.bcg, method = "ML"))

Testing fixed effects with {metafor} r .emo("practice")

You can use the F / t tests instead by setting test="knha", which is already better:

rma(yi = RR ~ alloc + year + ablat, vi = sampling.var, data = dat.bcg, method = "ML", test = "knha")

Testing fixed effects with {metafor} r .emo("practice")

You can also use a permutation test which now reveals that fixed effects (called moderators) are not significant overall:

permutest(fit_tub_meta_ML)

Does tuberculosis vaccination work? r .emo("practice")

To quantify the overall effect of vaccination, we can thus drop all fixed effect and refit our model:

(fit_tub_spaMM_ML2    <- fitme(RR ~ 1 + (1|study), data = dat.bcg, fixed = list(phi = dat.bcg$sampling.var)))

Does tuberculosis vaccination work? r .emo("practice")

Let's now test the intercept estimates which quantifies how the relative risk of getting turberculosis increases for vaccinated people:

fit_tub_spaMM_ML2_H0 <- fitme(RR ~ 0 + (1|study), data = dat.bcg, fixed = list(phi = dat.bcg$sampling.var))
anova(fit_tub_spaMM_ML2, fit_tub_spaMM_ML2_H0, boot.repl = 999, fit_env = list(dat.bcg = dat.bcg))

Conclusion: yes, vaccination is effective!

Does tuberculosis vaccination work? r .emo("practice")

Here is the overall effect size with 95% CI:

fixef(fit_tub_spaMM_ML2)
confint(fit_tub_spaMM_ML2, parm = "(Intercept)", 
        boot_args = list(nsim = 999, fit_env = list(dat.bcg = dat.bcg)))

Does tuberculosis vaccination work? r .emo("practice")

We can also test the overall effect of vaccination using {metafor}:

fit_tub_meta_ML2 <- rma(yi = RR ~ 1, vi = sampling.var, data = dat.bcg, method = "ML")
permutest(fit_tub_meta_ML2)

Notes:

Does tuberculosis vaccination work? r .emo("practice")

The overall effect of vaccination is shown at the bottom of the forest plot:

par(mar = c(4, 0, 0, 0))
forest(fit_tub_meta_ML2)

So which R package shall you use? r .emo("info")

Generic packages like {spaMM} offer options that specialised packages like {metafor} do not offer, such as modelling heteroskedasticity & autocorrelation, computing parametric bootstrap....

But the reverse is also true:

Specialised packages like {metafor} offer easy ways to compute all the usual metrics you need and to draw the plots people expect in a publication covering a meta-analysis.

For this reason, I tend to use both generic and specialised packages and compare results.


Specialised R packages are typically more accessible and easier to use but they tend to rely on outdated or suboptimal statistical methods because they are conceived by people only interested in their particular application and not by people generally knowledgeable in statistics.

Yet, {metafor} seems to be an exception in this respect: it seems as good as R packages get.

Personally, for the rare meta-analyses and meta-regressions I did, I used {spaMM} which I know better, but I checked that {metafor} was generating the same metrics and plots I was creating myself using outputs from {spaMM}.

Here is an example of such a work I did with my colleague Viktoriia Radchuk.

What you need to remember r .emo("goal")

Table of contents

Mixed-effects models


[Back to main menu](./Title.html#2)


courtiol/LM2GLMM documentation built on July 3, 2022, 7:42 a.m.