library(LM2GLMM) library(doSNOW) spaMM::spaMM.options(nb_cores = parallel::detectCores() - 1, barstyle = FALSE) options(width = 120) knitr::opts_chunk$set(cache = TRUE, cache.path = "./cache_knitr/MM_showcase/", fig.path = "./fig_knitr/MM_showcase/", fig.width = 5, fig.height = 4, fig.align = "center", error = TRUE)
r .emo("goal")
r .emo("info")
The goal of the animal model is to study the unobservable genetic potential of individuals. It can be used, for instance, to know which bull to breed in order to yield daughters with maximal milk production.
Nothing but a simple LMM:
$$ y_i = \mu + a_i + e_i$$
with:
The simplest animal model considers the genetic covariance between two individuals as $\text{A}\times V_\text{A}$:
The fit yields an estimation of the breeding values ($a_i$s) and their variance ($V_\text{A}$).
r .emo("info")
r .emo("alien")
r .emo("practice")
tail(Gryphon$pedigree) library(nadiv) A <- as(makeA(Gryphon$pedigree), "matrix") colnames(A) <- rownames(A) <- Gryphon$pedigree$ID A[1305:1309, 1296:1309]
r .emo("practice")
Let us fit an animal model of the birth weight of gryphons:
library(spaMM) (fit_bwt1 <- fitme(BWT ~ 1 + corrMatrix(1|ID), corrMatrix = A, data = Gryphon$data, method = "REML"))
r .emo("practice")
Here is the estimate for the variance of the breeding values (i.e. the additive genetic variance):
(Va <- get_ranPars(fit_bwt1 , which = "lambda")[[1]])
Here is the estimate of the remaining phenotypic variance:
(Ve <- unique(get_residVar(fit_bwt1)))
So we can compute the so called heritability:
(h2 <- Va / (Va + Ve))
Interpretation: r round(100*h2,1)
% of the variation in birth weight is due to inheritance!
r .emo("practice")
More complex animal models attempt to decompose variance into more than 2 components.
Here for instance, we can add fixed effects and distinguish the contribution of years and of so-called maternal effects:
Gryphon$data$sex <- factor(Gryphon$data$SEX) Gryphon$data$year <- factor(Gryphon$data$BYEAR) Gryphon$data$mother <- factor(Gryphon$data$MOTHER) fit_bwt2 <- fitme(BWT ~ sex + (1|year) + (1|mother) + corrMatrix(1|ID), corrMatrix = A, data = Gryphon$data, method = "REML") Va <- get_ranPars(fit_bwt2 , which = "lambda")[["ID.(Intercept)"]] ## additive genetic variance Vm <- get_ranPars(fit_bwt2 , which = "lambda")[["mother.(Intercept)"]] ## variance of maternal effects Vp <- sum(get_ranPars(fit_bwt2 , which = "lambda")) + unique(get_residVar(fit_bwt2)) ## variance not accounted ## for by fixed effects round(c("heritability (without maternal effect)" = Va/Vp, "maternal effect" = Vm/Vp), 3)
r .emo("practice")
We can estimate the breeding values using BLUPs (Best Linear Unbiased Predictions):
par(mar = c(4, 4, 0, 0)) curve(dnorm(x, sd = sqrt(Va)), from = -3, to = 3, las = 1, ylab = "pdf", xlab = "predicted breeding value") BLUPs <- ranef(fit_bwt2)$"corrMatrix(1 | ID)" points(dnorm(BLUPs, sd = sqrt(Va)) ~ BLUPs, col = "blue", type = "h", lwd = 0.1)
r .emo("info")
par(mar = c(4, 4, 0, 0)) plot(BLUPs[paste(fit_bwt2$data$ID)] ~ scale(fit_bwt2$data$BWT), cex = 0.5, ylab = "Predicted breeding values", xlab = "Scaled phenotypic values")
ade4::carni70
dataset r .emo("alien")
We will use these data to study how the geographic range size (km) of a species is influenced by its body size (kg) while accounting for the genetic relatedness between species.
data("carni70", package = "ade4") carni70$tab
r .emo("practice")
To accounting for the genetic relatedness between species, we start by reading the phylogenetic tree:
par(mar = c(1, 0, 0, 0)) tree <- ape::read.tree(text = carni70$tre) plot(tree, cex = 0.3)
r .emo("practice")
In order to use the phylogeny as an input of a MM, we then need to turn it into a correlation matrix:
(corrM <- ape::vcv(tree, model = "Brownian", corr = TRUE)) ## other models are possible
r .emo("practice")
We can now fit our phylogenetic regression model similarly to an animal model (but this time using ML to study fixed effects):
carni70$tab$sp <- factor(rownames(corrM)) (fit_carni <- fitme(range ~ size + corrMatrix(1|sp), corrMatrix = corrM, data = carni70$tab))
r .emo("practice")
We can now test the effect of the body size on the geographic range of the species accounting for the phylogeny as usual:
fit_carni_no_size <- fitme(range ~ 1 + corrMatrix(1|sp), corrMatrix = corrM, data = carni70$tab) anova(fit_carni, fit_carni_no_size, boot.repl = 999, fit_env = list(corrM = corrM)) ## fit_env is used for parallel processing: it declares objects not in data
Note:
r .emo("practice")
The package {nlme} is more flexible in this regard, but it presents its own drawbacks.
library(nlme) rownames(carni70$tab) <- rownames(corrM) (fit_carni2 <- gls(range ~ size, correlation = ape::corBrownian(1, phy = tree, form = ~ sp), method = "ML", data = carni70$tab))
A key benefit of using a proper (G)LMM (as in {spaMM}) instead of a Generalized Least Squares fit (as in {nlme}) is that we could also consider other random effects and a non Gaussian response (GLMM).
r .emo("practice")
fit_carni2_no_size <- gls(range ~ 1, correlation = ape::corBrownian(1, tree, form = ~ sp), method = "ML", data = carni70$tab) anova(fit_carni2, fit_carni2_no_size)
Notes:
metafor::dat.bcg
dataset r .emo("alien")
We will perform a meta-analysis of 13 studies on the effectiveness of the BCG vaccine against tuberculosis:
library(metafor) dat.bcg$study <- factor(dat.bcg$trial) dat.bcg$alloc <- factor(dat.bcg$alloc) dat.bcg
Note: {metafor} is a very good package dedicated to perform meta-analyses.
r .emo("info")
To compare studies, we need a single metric (and the uncertainty around it).
Here, we compute the relative risk and its associated sampling variance:
dat.bcg$RR <- with(dat.bcg, log((tpos/(tpos + tneg)) / (cpos/(cpos + cneg)))) dat.bcg$sampling.var <- with(dat.bcg, 1/tpos - 1/tneg + 1/cpos - 1/cneg) dat.bcg ## t = treatment = vaccinated / c = control = non-vaccinated
r .emo("practice")
(fit_tub_meta <- rma(yi = RR ~ factor(alloc) + year + ablat, vi = sampling.var, data = dat.bcg, method = "REML"))
r .emo("practice")
(fit_tub_spaMM <- fitme(RR ~ alloc + year + ablat + (1|study), data = dat.bcg, fixed = list(phi = dat.bcg$sampling.var), method = "REML")) ## phi must be fixed!!
r .emo("practice")
Estimated amount of residual heterogeneity between studies:
lambda <- get_ranPars(fit_tub_spaMM, which = "lambda")[[1]] round(c(tau2_metafor = fit_tub_meta$tau2, lambda_spaMM = lambda), 4)
Amount of heterogeneity accounted for by fixed effects:
fit_tub_spaMM_null <- fitme(RR ~ 1 + (1|study), data = dat.bcg, fixed = list(phi = dat.bcg$sampling.var), method = "REML") lambda0 <- get_ranPars(fit_tub_spaMM_null, which = "lambda")[[1]] R2_spaMM <- 100 * (lambda0 - lambda) / lambda0 round(c(R2_metafor = fit_tub_meta$R2, R2_spaMM = R2_spaMM), 4)
Conclusion: if you know a general and flexible package like {spaMM}, with a little tinkering of the outputs, you should be able to reproduce what many specialised packages do.
r .emo("nerd")
We can retrieve the usual metrics used in such studies:
get_meta_metrics <- function(fit, vi) { wi <- 1/vi ## weights = inverse of sampling variance k <- length(ranef(fit)[[1]]) ## number of groups in random term (here number of studies) p <- length(fixef(fit)) ## number of fixed effect parameters W <- diag(wi, nrow = k, ncol = k) ## matrix of weights X <- model.matrix(fit) ## design matrix stXWX <- solve(t(X) %*% W %*% X) ## weighted vcov of estimates P <- W - W %*% X %*% stXWX %*% crossprod(X, W) ## weighted vcov of ?? vi.avg <- (k - p) / sum(diag(P)) # weighted mean within study sampling variance lambda <- get_ranPars(fit, which = "lambda")[[1]] I2 <- as.numeric(100 * lambda / (vi.avg + lambda)) # residual heterogeneity / unaccounted variability H2 <- as.numeric((vi.avg + lambda) / vi.avg) # unaccounted variability / sampling variability QE <- max(0, c(crossprod(response(fit), P) %*% response(fit))) # residual heterogeneity return(c(I2 = I2, H2 = H2, QE = QE)) ## see ?print.rma.uni for details on the metrics } rbind(metafor = c(fit_tub_meta$I2, fit_tub_meta$H2, fit_tub_meta$QE), spaMM = get_meta_metrics(fit_tub_spaMM, dat.bcg$sampling.var))
r .emo("practice")
plot(fit_tub_meta)
r .emo("practice")
To summarise the global effect of vaccination, it would be helpful if the fixed effect where not significant... let us check:
fit_tub_spaMM_ML <- fitme(RR ~ alloc + year + ablat + (1|study), data = dat.bcg, fixed = list(phi = dat.bcg$sampling.var)) ## ML fit! fit_tub_spaMM_ML_H0 <- fitme(RR ~ 1 + (1|study), data = dat.bcg, fixed = list(phi = dat.bcg$sampling.var))
anova(fit_tub_spaMM_ML, fit_tub_spaMM_ML_H0, boot.repl = 999, fit_env = list(dat.bcg = dat.bcg))
Note: we are far from asymptotic conditions due to the low number of studies, so parametric bootstrap gives very different results but is clearly more reliable in these conditions.
r .emo("practice")
The default test is a Wald test and thus performs poorly especially when the number of studies is low...
(fit_tub_meta_ML <- rma(yi = RR ~ alloc + year + ablat, vi = sampling.var, data = dat.bcg, method = "ML"))
r .emo("practice")
You can use the F / t tests instead by setting test="knha"
, which is already better:
rma(yi = RR ~ alloc + year + ablat, vi = sampling.var, data = dat.bcg, method = "ML", test = "knha")
r .emo("practice")
You can also use a permutation test which now reveals that fixed effects (called moderators) are not significant overall:
permutest(fit_tub_meta_ML)
r .emo("practice")
To quantify the overall effect of vaccination, we can thus drop all fixed effect and refit our model:
(fit_tub_spaMM_ML2 <- fitme(RR ~ 1 + (1|study), data = dat.bcg, fixed = list(phi = dat.bcg$sampling.var)))
r .emo("practice")
Let's now test the intercept estimates which quantifies how the relative risk of getting turberculosis increases for vaccinated people:
fit_tub_spaMM_ML2_H0 <- fitme(RR ~ 0 + (1|study), data = dat.bcg, fixed = list(phi = dat.bcg$sampling.var)) anova(fit_tub_spaMM_ML2, fit_tub_spaMM_ML2_H0, boot.repl = 999, fit_env = list(dat.bcg = dat.bcg))
Conclusion: yes, vaccination is effective!
r .emo("practice")
Here is the overall effect size with 95% CI:
fixef(fit_tub_spaMM_ML2) confint(fit_tub_spaMM_ML2, parm = "(Intercept)", boot_args = list(nsim = 999, fit_env = list(dat.bcg = dat.bcg)))
r .emo("practice")
We can also test the overall effect of vaccination using {metafor}:
fit_tub_meta_ML2 <- rma(yi = RR ~ 1, vi = sampling.var, data = dat.bcg, method = "ML") permutest(fit_tub_meta_ML2)
Notes:
test = "knha"
would thus be best to compute such interval in the absence of alternatives.r .emo("practice")
The overall effect of vaccination is shown at the bottom of the forest plot:
par(mar = c(4, 0, 0, 0)) forest(fit_tub_meta_ML2)
r .emo("info")
Generic packages like {spaMM} offer options that specialised packages like {metafor} do not offer, such as modelling heteroskedasticity & autocorrelation, computing parametric bootstrap....
But the reverse is also true:
Specialised packages like {metafor} offer easy ways to compute all the usual metrics you need and to draw the plots people expect in a publication covering a meta-analysis.
For this reason, I tend to use both generic and specialised packages and compare results.
Specialised R packages are typically more accessible and easier to use but they tend to rely on outdated or suboptimal statistical methods because they are conceived by people only interested in their particular application and not by people generally knowledgeable in statistics.
Yet, {metafor} seems to be an exception in this respect: it seems as good as R packages get.
Personally, for the rare meta-analyses and meta-regressions I did, I used {spaMM} which I know better, but I checked that {metafor} was generating the same metrics and plots I was creating myself using outputs from {spaMM}.
Here is an example of such a work I did with my colleague Viktoriia Radchuk.
r .emo("goal")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.