inapplicable is an R package that allows parsimony search on morphological datasets that contain inapplicable data, following the algorithm proposed by Brazeau, Guillerme and Smith (2017).

In brief, this algorithm modifies the Fitch algorithm to count the total number of homoplasious events on a tree.

Installation

The inapplicable package can be installed as any other package. To get the latest stable version from CRAN, type

install.packages('inapplicable')

into the R command line. To get the latest development version from GitHub, it's

if (!require('devtools')) install.packages('devtools')
devtools::install_github('ms609/inapplicable')

You'll probably need to install Rtools to install the latest version, which may be complicated. If that's well within your comfort zone, then you may wish to install a byte-compiled version of ape for additional speed:

install.packages('ape', type='source', INSTALL_opts='--byte-compile')

Once installed, load the inapplicable package into R using

library(inapplicable)

Scoring a tree, and conducting a tree search

Here's an example of using the package to conduct tree search. You can load your own dataset like so:

my.data <- ape::read.nexus.data('path/to/file')

But for now, we'll use the Vinther et al. dataset that's bundled with the package. This dataset is small enough that it runs reasonably quickly, but its phylogenetic signal is obscure enough that it can require Ratchet searches to escape from local optima.

data(inapplicable.datasets)
my.data <- inapplicable.datasets[['Vinther2008']]
my.phyDat <- phangorn::phyDat(my.data, type='USER', levels=c(0:9, '-'))

We can generate a random tree and calculate its parsimony score thus:

set.seed(0) # Set random seed so that random functions will generate consistent output in this document
random.tree <- TreeSearch::RandomTree(my.phyDat)
par(mar=rep(0.25, 4), cex=0.75) # make plot easier to read
plot(random.tree)
InapplicableFitch(random.tree, my.phyDat)

It helps tree search if we start with a tree that's a little closer to optimal; perhaps a neighbour-joining tree:

nj.tree <- ape::nj(phangorn::dist.hamming(my.phyDat))
# We need to set an arbitrary ougroup so that all nodes bifurcate
nj.tree <- ape::root(nj.tree, outgroup=names(my.phyDat)[1], resolve.root=TRUE)
nj.tree$edge.length <- NULL
par(mar=rep(0.25, 4), cex=0.75) # make plot easier to read
plot(nj.tree)
InapplicableFitch(nj.tree, my.phyDat)

With the Vinther et al. 2008 dataset, brachiopods and nemerteans form an natural outgroup to the other taxa. If we wish, we can avoid making tree rearrangements that would mix ingroup and outgroup taxa. This will accellerate tree search, but it's worth thinking carefully whether you can be perfectly confident that the ingroup and outgroup are mutually monophyletic. First we need to separate the ingroup from the outgroup:

outgroup <- c('Nemertean', 'Lingula', 'Terebratulina')
rooted.tree <- TreeSearch::EnforceOutgroup(nj.tree, outgroup)
par(mar=rep(0.25, 4), cex=0.75) # make plot easier to read
plot(rooted.tree)

Now let's see whether a few Nearest-neighbour interchanges can find us a better tree score. This tends to be the quickest search to run, if not the most exhaustive. Using RootedNNI instead of NNI retains the position of the root, which you'll usually want to do.

better.tree <- BasicSearch(tree=rooted.tree, dataset=my.phyDat, Rearrange=TreeSearch::RootedNNI, verbosity=3)

We've got a score of 82: better than the 85 of the original neighbour-joining tree, but is this the best we can do?

Using NNI helps to explore the region of treespace close to the local optimum, but SPR and TBR rearrangements are better at escaping local optima, and find better trees further away in tree space. Using more hits (maxHits) and more iterations (maxIter) also means we'll move closer to an optimal tree.

better.tree <- BasicSearch(better.tree, my.phyDat, maxHits=40, maxIter=100000, Rearrange=TreeSearch::RootedSPR, verbosity=2)
better.tree <- BasicSearch(better.tree, my.phyDat, maxHits=40, maxIter=100000, Rearrange=TreeSearch::RootedTBR, verbosity=2)

That score's looking better, but we might still be caught in a local optimum. A more comprehensive search of tree space can be accomplished using the parsimony ratchet (Nixon 1999). It might take a couple of minutes to run.

best.tree <- RatchetSearch(better.tree, my.phyDat, verbosity=0, k=5)
attr(best.tree, 'score') # Each tree is labelled with its score during tree search

No better trees were found. We can be pretty confident that there are no better trees than this one. Let's take a look at it:

par(mar=rep(0.25, 4), cex=0.75) # make plot easier to read
plot(best.tree)

This is, of course, just one most parsimonious tree; perhaps there are many.

One way to make a strict consensus of multiple optimal trees is to collect a number of trees from independent Ratchet iterations.

For each Ratchet iteration, we'll conduct a TBR search to scan tree space, then an NNI search to hone in on the local optimum.

my.consensus <- RatchetConsensus(best.tree, my.phyDat, rearrangements=list(TreeSearch::RootedTBR, TreeSearch::RootedNNI))

If ten independent runs all ended up at different trees, there are probably many more optimal trees out there to be found; perhaps we could repeat RatchetConsensus with nSearch = 250 for a more exhaustive sampling of tree space. That would take a while though; for now, let's check out our consensus tree:

par(mar=rep(0.25, 4), cex=0.75) # make plot easier to read
plot(ape::consensus(my.consensus))

References

BRAZEAU, M. D., GUILLERME, T. and SMITH, M. R. 2017. Morphological phylogenetic analysis with inapplicable data. BioRxiv. doi:10.1101/209775

NIXON, K. C. 1999. The Parsimony Ratchet, a new method for rapid parsimony analysis. Cladistics, 15, 407--414. doi:10.1111/j.1096-0031.1999.tb00277.x



ms609/inapplicable documentation built on May 23, 2019, 7:49 a.m.