distMRCA: Calculate plot-level distances to most recent common...
In eliotmiller/metricTester: Test Metric and Null Model Statistical Performance

Description Usage Arguments Details Value References Examples

Given a picante-style community data matrix (sites are rows, species are columns), and a phylogeny, calculate the distances of sets of taxa to their MRCA.

1	distMRCA(samp, tree, pairwise)

`samp`	A picante-style community data matrix with sites as rows, and species as columns.
`tree`	An ape-style phylogeny.
`pairwise`	Whether to use the MRCA of all taxa in the sample, or the MRCA of each pairwise comparison in the sample. See details.

Experimental metrics! This function calculates two simple but potentially useful measures. The first, accessed by setting pairwise to FALSE, is the mean branch length between a set of taxa and their most recent common ancestor (MRCA). I have not seen this used in the literature before, but it seems likely I'm wrong. This metric was not tested in our recent Ecography review, but given certain data structures, it seems potentially useful. In other cases, the MRCA will often simply be the root of the tree, and the metric will perhaps be of less use. Large values of the version of distMRCA correspond to taxa with a distant MRCA, while small values correspond to taxa with a more recent MRCA. Given an ultrametric tree, the mean distance between a set of taxa and a single ancestor is of course equal to the distance between one of those taxa and the ancestor. However, in case an ultrametric tree is passed to the function, I do define it as the mean distance between all present taxa and their MRCA. It will throw a warning if a non-ultrametric tree is passed along.

The second measure calculated by this function is accessed by setting pairwise to TRUE. Here, per plot, the metric finds the distance of the MRCA of each pairwise taxon comparison from the root. The value returned per plot is then the mean of these distances. DANGER. Because this second option calculates all pairwise comparisons, the time it takes to run grows exponentially with the size of the community data matrix. For instance, on my personal computer, pairwise distMRCA was calculated in 0.2 seconds for a CDM with 16 plots containing between 10 and 25 species each. However, for a CDM with 100 plots containing between 25 and 55 species, it took 42s. In contrast to the first flavor of this metric, large values of this metric correspond to plots where the taxa present are more recently derived, while small values correspond to plots where the taxa are less recently derived (average common ancestor closer to the root). To make these measures more comparable, it may be better subtract the final values from the total tree height (with caveat about ultrametric tree above). It would also be easy to derive an abundance weighted version of this function. UPDATE. It appears that this second form is yet another (slower) way of deriving the calculation of MPD/PSV.

A vector of distMRCA values.

Miller, E. T. 2016. Random thoughts.

#simulate tree with birth-death process
tree <- geiger::sim.bdtree(b=0.1, d=0, stop="taxa", n=50)

sim.abundances <- round(rlnorm(5000, meanlog=2, sdlog=1)) + 1

cdm <- simulateComm(tree, richness.vector=10:25, abundances=sim.abundances)

results <- distMRCA(cdm, tree, pairwise=FALSE)