In TGuillerme/Inapp: Reconstruction of Inapplicable Discrete Characters on Phylogenetic Trees

Examples {#examples}

This vignette describes how the algorithm approaches some example trees. We follow the example of a tail coded using two characters:

Tail: (0), absent; (1), present;

Tail colour: (0), red; (1), blue.

Some caterpillars

First we'll address some pectinate "caterpillar" trees, in which eight taxa have tails (and eight do not), four of which are red, four of which are blue.

comb <- ape::read.tree(text="(A, (B, (C, (D, (E, (F, (G, (H, (I, (J, (K, (L, (M, (N, (O, P)))))))))))))));")
clr1 <- "--------00001111"
tai1 <- "0000000011111111"
clr2 <- "0000--------1111"
tai2 <- "1111000000001111"
clrA <- "000011--------11"
taiA <- "1111110000000011"
clrB <- "00---00--11---11"
taiB <- "1100011001100011"
clrC <- "0-0--00--11---11"
taiC <- "1010011001100011"
clrD <- "0-0-0-0-1-1-1-1-"
taiD <- "1010101010101010"
clrE <- "00----0011----11"
taiE <- "1100001111000011"

An optimal tree with this character invokes a single origin of the tail, and a single change in tail colour, thus incurring a score of two. Here is one example:

goloPlot(tai1, clr1)

If we insist that the tail evolves twice, then the best score is accomplished by reconstructing a different colour of tail in each of the two regions in which the tail is present. On a caterpillar tree, this means the loss of a tail that has one colour, and an independent innovation in a tail-less taxon of a tail that has a different colour:

goloPlot(tai2, clr2)

Under the parsimony criterion, it is considered less optimal if a tail, when it re-evolves, happens to independently re-evolve a colour that has already been observed -- "blueness" has evolved twice on the following tree, meaning that the second innovation of "blueness" represents an instance of homoplasy.

goloPlot(taiA, clrA)

Three equally suboptimal alternatives

The following three trees differ in the number of innovations of the tail that are implied, and the number of changes in tail colour. All are equally parsimonious.

Under the first, our algorithm reconstructs the tail as ancestrally present, being lost on edge 2, gained on edge 5, lost in tips H and I, lost on edge 11, and gained on edge 14 (a total of six homoplasies). It further reconstructs independent, homoplastic origins of tail redness on edge 5, tail blueness on edge 14, and a change in tail colour from red to blue somewhere between edges 7 and 9 (three homoplasies).

goloPlot(taiB, clrB, TRUE)

In the second, our algorithm reconstructs the tail as ancestrally present, being lost in tips B, D, E, H, and I, and on edge 11, before being independently gained on edge 14 (a total of seven homoplasies).
It further reconstructs an independent, homoplastic origins of tail blueness on edge 14, and a change in tail colour from red to blue somewhere between edges 7 and 9 (two homoplasies).

goloPlot(taiC, clrC, TRUE)

The third configuration reconstructs the tail as ancestrally present, being lost in tips B, D, F, H, J, L, N and P (a total of eight homoplastic losses). It further reconstructs a single change in tail colour from red to blue on edge 8.

goloPlot(taiD, clrD, TRUE)

A better caterpillar tree

The tree below obtains a better score than any of the previous three: it implies a loss of the tail at edge 2, a gain at edge 6, a loss at edge 10, and a gain at edge 14; it invokes a homoplastic origin of redness at edge 6, one of blueness at edge 14, and a change in colour at edge 8, for a combined score of 7.

goloPlot(taiE, clrE, TRUE)

De Laet's caterpillars

combH <- ape::read.tree(text="(A, (B, (C, (D, (E, (F, (G, H)))))));")

De Laet [-@DeLaet2017] identifies a case in which our algorithm [@Brazeau2018] will not reconstruct every equally-parsimonious character reconstruction. Below is a simplified version of his example:

tail <- "011?0011"
clr2 <- "-11?--22"
mDep <- matrix(strsplit(paste0(tail, clr2, collapse=''), '')[[1]], byrow=TRUE, nrow=2)
colnames(mDep) <- LETTERS[1:ncol(mDep)]
rownames(mDep) <- c("Tail: (0), absent; (1), present.",
                    "Tail, colour: (1), red; (2), blue.")
knitr::kable(mDep, caption="Coding")

When optimising tail colour, we reconstruct the tail as present at all internal nodes, with independent losses of the tail in each of the three tailless taxa (i.e. edges 1, 9, 11).

goloPar()
vignettePlot(combH, tail, legend.pos='topleft', FALSE, passes=2,
             main='Tail', state.labels = ap)
edgelabels(cex=1.6, frame='none')
tiplabels(cex=1.6, frame='none', LETTERS[1:8], adj=c(2.5, 0.5))
vignettePlot(combH, clr2, legend.pos='topleft', passes=4,
             main='Tail colour', state.labels=orb)

The Fitch algorithm identifies other reconstructions as equally parsimonious: for example, a tail may have been lost on edge 6 and re-gained on edge 12. This also incurs three steps for the tail character, and (in De Laet's parlance) attributes three similarities to common ancestry: the presence of a tail in tips B and C, the absence of the tail in tips E and F, and the presence of a tail in tips G and H.

We prefer reconstructions that attribute the presence of a feature to common ancestry where possible -- a philosophy that shares something with Dollo's contention that it is easier to lose a feature than to gain it. On a pragmatic level, this maximises the opportunity for subsidiary traits of the tail to be attributed to common ancestry.

In this particular case, there is an equally-parsimonious character reconstruction that our algorithm excludes, which invokes two gains (and one loss) of the tail:

goloPar()

vignettePlot(combH, "011?0011", legend.pos='topleft', FALSE, passes=0,
             main='Tail', state.labels = ap)
edgelabels(cex=1.6, frame='none')
nodelabels(pch=16, cex=6, col='#58a691a7')
nodelabels(c('01', 1, 1,'01',0, 0, 1), cex=1.6, frame='none')

tiplabels(cex=1.6, frame='none', LETTERS[1:8], adj=c(2.5, 0.5))
vignettePlot(combH, "-11?--22", legend.pos='topleft', passes=0,
             main='Tail colour', state.labels=orb)
nodelabels(pch=16, cex=6, col='#58a691a7')
nodelabels(c("-1", "1", 1, '-1', rep("-", 2), 2), cex=1.6, frame='none')

This has no effect on tree scoring, but may be relevant if complete internal nodal reconstructions are desired.

TGuillerme/Inapp documentation built on Feb. 4, 2024, 7:26 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

TGuillerme/Inapp
Reconstruction of Inapplicable Discrete Characters on Phylogenetic Trees

In TGuillerme/Inapp: Reconstruction of Inapplicable Discrete Characters on Phylogenetic Trees

Examples {#examples}

Some caterpillars

Three equally suboptimal alternatives

A better caterpillar tree

De Laet's caterpillars

R Package Documentation

Browse R Packages

We want your feedback!

TGuillerme/Inapp Reconstruction of Inapplicable Discrete Characters on Phylogenetic Trees

In TGuillerme/Inapp: Reconstruction of Inapplicable Discrete Characters on Phylogenetic Trees

Examples {#examples}

Some caterpillars

Three equally suboptimal alternatives

A better caterpillar tree

De Laet's caterpillars

R Package Documentation

Browse R Packages

We want your feedback!

TGuillerme/Inapp
Reconstruction of Inapplicable Discrete Characters on Phylogenetic Trees