knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The dataset is taken from Mahler et al. (2013)[^1].
It is embedded in the package, so that we can load it and plot it using ape
base functions.
library(cauphy) data(lizards) attach(lizards) plot(phy, show.tip.label = FALSE, x.lim = 65) phydataplot(svl, phy, scaling = 2) nodelabels(node = c(142, 151, 157))
The phylogeny is a time calibrated tree, and the trait is the natural log-transformed species average snout-to-vent length (in centimeters) of lizard species.
The fitCauchy
function can fit a Cauchy process (CP) to the data, using the REML
by default.
cauchy_reml <- fitCauchy(phy, svl, model = "cauchy", method = "reml") cauchy_reml
The print
method shows the main information of the plot,
including the (restricted) likelihood and AIC values,
and the parameter estimates.
Estimated confidence intervals for the parameters can be obtained through the numerical computation of the Hessian.
confint(cauchy_reml)
We can also plot the profile likelihood around the inferred parameter value, in order to check that the numerical optimization went well, and that the likelihood computation did not encounter some numerical robustness issues.
prl <- profile(cauchy_reml) plot(prl)
Here, the likelihood is smooth, and has a clear maximum in the estimated dispersion value.
It is also possible to directly fit the maximum likelihood, instead of the REML, in this case with a fixed root.
cauchy_ml <- fitCauchy(phy, svl, model = "cauchy", method = "fixed.root") cauchy_ml
In this case, both the root value and the dispersion are estimated using ML, and their confidence interval can also be approximated using the Hessian.
confint(cauchy_ml)
Note that in this case with many tips, the difference between the ML and REML estimates of the dispersion are small.
Finally, we can also plot the profile log-likelihood in both estimated parameters (the other parameter being fixed to its optimal value). Here again, the likelihood surface seems to be smooth and have a clear maximum.
prl <- profile(cauchy_ml) plot(prl)
We can use either ML or REML for model selection, comparing the AIC value of the Cauchy fit with those of Gaussian models.
Here, we use phylolm
to fit Gaussian models, which uses an ML estimate (with fixed root).
We only compare the CP with the BM, OU and EB without measurement error for simplicity,
but results for other Gaussian models are similar.
library(phylolm) bm_ml <- phylolm(svl ~ 1, phy = phy, model = "BM") ou_ml <- phylolm(svl ~ 1, phy = phy, model = "OUfixedRoot") eb_ml <- phylolm(svl ~ 1, phy = phy, model = "EB") data.frame(model = c("BM", "OU", "EB", "CP"), AIC = c(AIC(bm_ml), AIC(ou_ml), AIC(eb_ml), AIC(cauchy_ml)))
Here, the Cauchy Process is clearly selected by the AIC score.
For ancestral trait reconstruction, we compute the posterior trait density conditionally on the tip values, given a fitted object.
To do ancestral state reconstruction, we use the REML fitted object. The ML fitted object could also be used, and would give similar results, except for the root, which can be reconstructed in the REML case, but is fixed in the ML case.
anc_reml <- ancestral(cauchy_reml, n_values = 200)
We reconstruct the posterior density of each internal node on a grid of values. A default grid with $100$ is used if un-specified, but it can be tailored to specific needs by raising the number of points on the grid, or directly specifying a custom grid of values. By default, the reconstruction is carried on all internal nodes, but a subset of nodes of interest can be specified.
The posterior density can be plotted for specific nodes of interests. Here, node $151$ exhibits a bi-modal behavior, between "large" and "small" values of the trait at this internal node, representing two possible evolutionary scenarios.
plot(anc_reml, node = c(142, 151), type = "l")
Using the HDInterval
package, it is possible to compute
Highest (Posterior) Density Intervals for ancestral nodes, using the
density estimated on the grid.
library(HDInterval) anc_int <- hdi(anc_reml) # defaults to 95% HDI plot(anc_reml, node = c(142, 151), intervals = anc_int, type = "l")
The hdi
function can recover discontinuous intervals
(thanks to the allowSplit
option, see documentation),
which can be relevant for multi-modal reconstructed densities,
such as node 151 here.
anc_int[["151"]]
Note that the quality of the estimation of the interval depends on the quality of the estimation of the ancestral density. In particular, if the grid is inadequate or too coarse, then the resulting intervals can be meaningless.
anc_bad <- ancestral(cauchy_reml, n_values = 10) # grid is too coarse anc_int_bad <- hdi(anc_bad) plot(anc_bad, node = c(142, 151), intervals = anc_int_bad, type = "l")
Ancestral branch increment reconstruction computes the posterior density of the increment of a trait on each branch of the tree. Large branch increments can be associated with shifts on the trait value.
The process is similar to ancestral reconstruction, but is computationally more expensive. A built-in option can allow for parallelization of these computations. To avoid long run times, we only reconstruct the increment on a few chosen branches here.
inc_reml <- increment(cauchy_reml, node = c(142, 151, 157), n_cores = 1)
We recover here a behavior that is complementary to the node reconstruction, with the edge ending at node $151$ being bi-modal, with modes around $0$ (no change on the edge) and $1$ (shift in the log trait space).
plot(inc_reml, node = c(142, 151), type = "l")
We can again compute and plot the HDI for the ancestral increments:
inc_int <- hdi(inc_reml) # defaults to 95% HDI plot(inc_reml, node = c(142, 151), intervals = inc_int, type = "l")
We can then plot schematics of these reconstructions on the tree, using thermo plots. To avoid any clutter, we only plot a few of them here.
plot_asr(cauchy_reml, anc = anc_reml, inc = inc_reml, show.tip.label = FALSE, width.node = 0.8, height.node = 1.8, width.edge = 1.5, height.edge = 0.8)
In this representation, trait or increment values are mapped to a common color scale.
Each thermo plot represents the reconstructed node or branch posterior traits,
with colors mapped to the modes of the distribution, and a width proportional to their
relative weights.
The function inherits from all the ape::plot
function, and can be customized accordingly.
[^1]: Mahler, D. Luke; Ingram, Travis; Revell, Liam J.; Losos, Jonathan B. (2013), Data from: Exceptional convergence on the macroevolutionary landscape in island lizard radiations, Dryad, Dataset, https://doi.org/10.5061/dryad.9g182
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.