In TGuillerme/Inapp: Reconstruction of Inapplicable Discrete Characters on Phylogenetic Trees

library('Inapp'); library('ape')
setMargins <- function () par(mar=rep(0.2, 4))
markChanges <- function (edges, labels='X') edgelabels(frame='none', edge=edges, labels, font=2)
col.tips.nodes = c("#fc8d59", "#eeeeeed0", "#7fbf7be0", "#af8dc3e0")
redBlue <- colours <- c(TreeTools::brewer[[2]], 'grey')
deltran <- '<span style="font-variant:small-caps;">DelTran</span>'
acctran <- '<span style="font-variant:small-caps;">AccTran</span>'
choose1 <- "[01]=>1"

The Fitch algorithm {#fitch}

This algorithm was proposed by Fitch [-@Fitch1971] and is implemented in phylogenetic software that employs maximum parsimony [@swofford2001paup; @Goloboff2016] and probabilistic methods [@Ronquist2012mrbayes; @Stamatakis2014]. The simple and elegant procedure and entails going down the tree to count the number of transformations, then going back up the tree to finalise the ancestral state reconstructions.

The way the algorithm goes up and down depends on the software, but often employs a traversal: a recursive function that can visit all tips and nodes in a logical fashion. For example, consider the following tree:

setMargins()
tree <- ape::read.tree(text = "(((a, b), (c, d)), e);")
tree$node.label <- c("n4", "n3", "n1", "n2")
plot(tree)
nodelabels(tree$node.label)

A downpass traversal will first evaluate the first cherry (or pair of taxa) (A below), then save the results in n1 and evaluate the second pair of tips/nodes (B), saving the results in n2, proceeding until all the nodes and tips have been visited.

setMargins()
plot(tree)
nodelabels(tree$node.label)

## a
a <- c(4, 1.2)
## b
b <- c(4, 1.8)
## c
c <- c(4, 3.2)
## d
d <- c(4, 3.8)
## e
e <- c(4, 4.8)
## n2
n2_1 <- c(3.2, 3.5)
n2_2 <- c(2.8, 3.3)
## n1
n1_1 <- c(3.2, 1.5)
n1_2 <- c(2.8, 1.7)
## n3
n3_1 <- c(1.2, 2.5)
n3_2 <- c(0.8, 2.7)
## n4
n4 <- c(0.2, 3.75)

## First cherry
arrows(a[1], a[2], n1_1[1], n1_1[2]-0.025, length = 0.05)
arrows(b[1], b[2], n1_1[1], n1_1[2]+0.025, length = 0.05)
points(3.8, 1.5, cex = 3)
text(3.8, 1.5, "A", cex = 1)
## Second cherry
arrows(d[1], d[2], n2_1[1], n2_1[2]+0.025, length = 0.05)
arrows(c[1], c[2], n2_1[1], n2_1[2]-0.025, length = 0.05)
points(3.8, 3.5, cex = 3)
text(3.8, 3.5, "B", cex = 1)
## Third
arrows(n1_2[1], n1_2[2], n3_1[1], n3_1[2]-0.025, length = 0.05)
arrows(n2_2[1], n2_2[2], n3_1[1], n3_1[2]+0.025, length = 0.05)
points(2.5, 2.5, cex = 3)
text(2.5, 2.5, "C", cex = 1)
## Fourth
arrows(n3_2[1], n3_2[2], n4[1], n4[2]-0.025, length = 0.05)
arrows(e[1], e[2], n4[1], n4[2]+0.025, length = 0.05)
points(0.7, 3.5, cex = 3)
text(0.7, 3.5, "D", cex = 1)

An uppass traversal works identically but in the other direction, going from the nodes towards the tips.

setMargins()
plot(tree)
nodelabels(tree$node.label)

## First cherry
arrows(n1_1[1], n1_1[2]-0.025, a[1], a[2], length = 0.05)
arrows(n1_1[1], n1_1[2]+0.025, b[1], b[2], length = 0.05)
points(3.8, 1.5, cex = 3)
text(3.8, 1.5, "D", cex = 1)
## Second cherry
arrows(n2_1[1], n2_1[2]+0.025, d[1], d[2], length = 0.05)
arrows(n2_1[1], n2_1[2]-0.025, c[1], c[2], length = 0.05)
points(3.8, 3.5, cex = 3)
text(3.8, 3.5, "C", cex = 1)
## Third
arrows(n3_1[1], n3_1[2]-0.025, n1_2[1], n1_2[2], length = 0.05)
arrows(n3_1[1], n3_1[2]+0.025, n2_2[1], n2_2[2], length = 0.05)
points(2.5, 2.5, cex = 3)
text(2.5, 2.5, "B", cex = 1)
## Fourth
arrows(n4[1], n4[2]-0.025, n3_2[1], n3_2[2], length = 0.05)
arrows(n4[1], n4[2]+0.025, e[1], e[2], length = 0.05)
points(0.7, 3.5, cex = 3)
text(0.7, 3.5, "A", cex = 1)

In both traversals, the sequence in which nodes are visited (i.e. which cherry to pick first in a downpass traversal, or whether to continue left or right in an uppass traversal) is arbitrary, provided that all tips and nodes are eventually visited.

Now let's consider a more complex tree ((((a, b), c), d), (e, (f, (g, h)))); with a binary character distributed ((((1, 0), 0), 1), (1, (0, (0, 1)))):

tree <- ape::read.tree(text = "((((a, b), c), d), (e, (f, (g, h))));")
char <- c(1,0,0,1,1,0,0,1)

## Plotting the tree and characters
setMargins()
plot(tree, adj = 0.5)
tiplabels(char, cex = 1, bg = TreeTools::brewer[[2]][char + 1], adj = 1)
nodelabels(paste0("n", 9:15), bg = "bisque")

We can use the Inapp package to apply the Fitch algorithm for this character on this tree.

## Loading the Inapp package
library(Inapp)

## The tree
tree <- read.tree(text = "((((a, b), c), d), (e, (f, (g, h))));")

## The character
character <- "10011001"

## Applying the Fitch algorithm
matrix <- apply.reconstruction(tree, character, method = "Fitch", passes = 2)

Downpass

The downpass is quite simple and follows these rules for the two possible cases in the traversal [@Fitch1971]:

If the two considered tips or nodes have at least one state in common, set the node to be these states in common.
Else, if there is nothing in common between the to tips or nodes, set the node to be the union of the two states.

For example, node n15 is case 2 because there is nothing in common between the tips h and g (states 1 and 0 respectively). Node n14 is case 1 because there is at least one state in common between the tip f and the node n15 (state 0).

Important: when the case 2 is encountered, a transformation is implied in the descendants of the considered node. The score of the tree is incremented by +1.

In the following example, nodes that are in case 1 are in white and nodes in case 2, which imply a transformation (i.e. that adds to the tree score), are in green.

## Plotting the first downpass
plot.states.matrix(matrix, passes = 1, counts = 2, show.labels = c(1,2))
tiplabels(char, cex = 1, bg = TreeTools::brewer[[2]][char + 1], adj = 1)

Uppass

The score of the tree is known after the downpass, but the states of some nodes might not yet be properly resolved. For example, parsimonious reconstructions exist that reconstruct node n14 as state 1, as opposed to the 0 presently reconstructed. The present reconstruction seemingly indicates a change from state 1 to state 0 in an ancestor of n14, and a subsequent change from state 0 to state 1 in the ancestor of h, but an alternative reconstruction is equally parsimonious: n13, n14 and n15 may all have state 1, with a change from state 1 to state 0 in each of the two lineages leading to f and g.

The uppass traversal employs the following rules [@Fitch1971]:

If the current node and its ancestor have all states in common, the node is already resolved.
If there is at least one state in common between both left and right direct descendants of the current node, resolve the node as being the states in common between its ancestor and both its descendants.
If there is no state in common between its descendants, resolve the node as being the states in common between the ancestor and the current node.

For example, node n13 is already resolved (it has all its states in common with the ancestor - case 1). Node n14 has not all its states in common with its ancestor but its two descendants (n15 and f) have at least one state in common (0). This node is thus solved to be the states in common between both descendants (01) and its ancestor (01 as well).

## Plotting the first uppass
plot.states.matrix(matrix, passes = 1:2, counts = 2,
                   show.labels = c(1, 2), col.states=TRUE)

More complex cases can be studied in the Inapp App (running in your favourite web browser) by switching the Reconstruction method to Normal Fitch.

## Running the Inapp App
runInapp()

Resolving ambiguous resolutions

In certain cases, it is necessary to go further and discriminate between the equally-parsimonious reconstructions provided by the Fitch algorithm. A number of approaches have been proposed concerning which resolution of ambiguous nodes is preferable.

The two most familiar approaches to resolving ambiguous node are the Accelerated Transformation (r acctran) and Delayed Transformation (r deltran) approaches [@Farris1970;@Swofford1987].

The r acctran approach reconstructs transformations as occurring as close to the root as possible; the r deltran, as far from the root as possible.

In this case, the ambiguous resolution of the root leaves two options for the latter:

(ref:fig-cap-trans) Node reconstructions after r acctran or r deltran optimizations

par(mfrow=c(2, 2), mar=rep(1, 4))

tree <- read.tree(text = "((((a, b), c), d), (e, (f, (g, h))));")
tree$tip.label <- paste("_", tree$tip.label)
char <- c(1,0,0,1,1,0,0,1)

edge_col <- TreeTools::brewer[[2]][c(2,2,1,  2,1,1,2,
                                     2,2,2,  1,1,1,2)]

graphics::plot(tree, show.tip.label = TRUE, main='AccTran',
               type = "phylogram", use.edge.length = TRUE, cex = 1, adj = 0,
               edge.color = edge_col, edge.width = 2,
               x.lim=c(-0.5, 8), y.lim=c(0.5, 8.5))

node_labels <- paste(paste("Acc:", sep = ""), c(1,1,0,0,1,0,0))
node_labels <- paste(paste("n", 9:15, "\n", sep = ""), node_labels, sep = "")

ape::nodelabels(node_labels, cex = 0.9, bg = col.tips.nodes[c(2,3,2,3,3,2,3)])

tiplabels(char, cex = 1, bg = colours[char + 1], adj = 1)
markChanges(c(2, 4, 10, 14))

edge_col <- redBlue[c(2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2)]

graphics::plot(tree, show.tip.label = TRUE, main='DelTran',
               type = "phylogram", use.edge.length = TRUE, cex = 1, adj = 0,
               edge.color = edge_col, edge.width = 2,
               x.lim=c(-0.5, 8), y.lim=c(0.5, 8.5))

node_labels <- paste(paste("Del:", sep = ""), rep(1, 7))
node_labels <- paste(paste("n", 9:15, "\n", sep = ""), node_labels, sep = "")

ape::nodelabels(node_labels, cex = 0.9, bg = col.tips.nodes[c(2,3,2,3,3,2,3)])

tiplabels(char, cex = 1, bg = colours[char + 1], adj = 1)
markChanges(c(5, 6, 11, 13))

plot(1, type="n", axes=F, xlab="", ylab="")
#edge_col <- TreeTools::brewer[[2]][3-c(2,2,1,1,  1,1,2,
#                                 2,2,2,1,  1,1,2)]
#
#graphics::plot(tree, show.tip.label = TRUE, main='AccTran',
#               type = "phylogram", use.edge.length = TRUE, cex = 1, adj = 0,
#               edge.color = edge_col, edge.width = 2)
#
#node_labels <- paste(paste("Acc:", sep = ""), 1-c(1,1,0,0,1,0,0))
#node_labels <- paste(paste("n", 9:15, "\n", sep = ""), node_labels, sep = "")
#
#ape::nodelabels(node_labels, cex = 0.9, bg = col.tips.nodes[c(2,3,2,3,3,2,3)])
#
#tiplabels(char, cex = 1, bg = colours[char + 1], adj = 1)
#edgelabels(frame='none', edge=c(2, 4, 10, 14), "X", col=col.tips.nodes[3])


edge_col <- redBlue[c(1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 1, 1, 2)]

graphics::plot(tree, show.tip.label = TRUE, main='Acc/DelTran',
               type = "phylogram", use.edge.length = TRUE, cex = 1, adj = 0,
               edge.color = edge_col, edge.width = 2,
               x.lim=c(-0.5, 8), y.lim=c(0.5, 8.5))

node_labels <- paste(paste("Del:", sep = ""), rep(0, 7))
node_labels <- paste(paste("n", 9:15, "\n", sep = ""), node_labels, sep = "")

ape::nodelabels(node_labels, cex = 0.9, bg = col.tips.nodes[c(2,3,2,3,3,2,3)])

tiplabels(char, cex = 1, bg = colours[char + 1], adj = 1)
markChanges(c(4, 7, 9, 14))

If the states 0 and 1 represent states of a transformational character -- whether an organism's tail is red or blue, say -- then there is no reason to prefer any of the equally-parsimonious reconstructions, as none implies any more homology than any other.

With neomorphic characters, however, state 0 stands for the absence of a character -- for example, a tail -- and state 1 its presence. On one view, a reconstruction that minimises the number of times that such a character evolves attributes more similarity to homology than an equally parsimonious reconstruction in which said character is gained multiple times independently.

Maximising homology

Neither r acctran nor r deltran is guaranteed to maximise homology [@Agnarsson2008]. In this particular case, the r deltran reconstruction maximises homology. If the character denotes the presence or absence of a tail, then this reconstruction invokes the presence of a tail in the common ancestor of all taxa, meaning that the tails present in tips a, d, e and h are homologous with one another. The r acctran reconstruction, in contrast, identifies a loss of a tail at nodes 11 and 14, with a tail evolving independently in tips a and h. Under this reconstruction, the tails of a and h are not homologous with each other, or with the tails of d and e. (The alternative r deltran approach, which could arguably be described as r acctran instead, invokes four independent origins of the character and clearly does not maximise its homology.)

Where we wish to maximise homology, we modify the Fitch uppass such that any node whose final state reconstruction would be ambiguous is instead reconstructed as present when that node is encountered. This approach maximises homology in the problematic trees presented by Agnarsson & Miller [-@Agnarsson2008; "A&M" below]:

par(mfrow=c(1, 2), mar=rep(0.75, 4))
tree <- read.tree(text = "(H, (G, (F, (E, (D, (C, (B, A)))))));")
tree$tip.label <- paste("_", tree$tip.label)
char <- c(0,0,1,1,1,0,1,0)

edge_col <- TreeTools::brewer[[2]][c(1,1,1,rep(2, 7), 1, 2, 2, 1)]

graphics::plot(tree, show.tip.label = TRUE, main='A&M fig. 2',
               type = "phylogram", use.edge.length = TRUE, cex = 1, adj = 0,
               edge.color = edge_col, edge.width = 2, direction='upwards',
               x.lim = c(0.7200000, length(char) + 0.5),
               y.lim = c(-0.5, length(char) + 0.5))


down_labels <- paste("1:", c(0, 1, '01')[c(1, 3, 2, 2, 3, 1, 3)])
up_labels <- paste("2:",   c(0,0,1,1,1,choose1,1)) # n14 informs n15
#node_labels <- paste(paste("n", 9:15, sep = ""), down_labels, up_labels, sep = "\n")
node_labels <- paste(down_labels, up_labels, sep = "\n")

ape::nodelabels(node_labels, cex = 0.9, bg = col.tips.nodes[c(2,3,2,2,3,2,3)])

tiplabels(char, cex = 1, bg = colours[char + 1], adj = 1)
markChanges(c(4, 11, 14), c('0->1', '1->0', '1->0'))
# Check that this is how our algorithm works with the following lines:
#toInapp <- function (x) rev(c('-', '0')[x + 1])
#vignettePlot(tree, toInapp(char), passes=1:2, legend.pos='none')
#vignettePlot(tree, rev(char), na=FALSE, passes=1:2, legend.pos='none')

tree <- read.tree(text = "(J, (I, (H, (G, (F, (E, (D, (C, (B, A)))))))));")
tree$tip.label <- paste("_", tree$tip.label)
char <- c(0,0,1,0,1,1,1,0,1,0)

edge_col <- TreeTools::brewer[[2]][c(1,1,1,2, 2, 2,1,2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1)]

graphics::plot(tree, show.tip.label = TRUE, main='A&M fig. 3',
               type = "phylogram", use.edge.length = TRUE, cex = 1, adj = 0,
               edge.color = edge_col, edge.width = 2, direction='upwards',
               x.lim = c(0.7200000, length(char) + 0.5),
               y.lim = c(-0.5, length(char) + 0.5))

down_labels <- paste("1:", c(0, 1, '01')[c(1,3,2,3,2,2,3,1,3)])
up_labels <- paste("2:",   c(0,0, choose1, rep(1, length(char) - 6), choose1, 1))
#node_labels <- paste(paste("n", 9:15, sep = ""), down_labels, up_labels, sep = "\n")
node_labels <- paste(down_labels, up_labels, sep = "\n")

ape::nodelabels(node_labels, cex = 0.9, bg = col.tips.nodes[c(2,3,2,3,2,2,3,2,3)])

tiplabels(char, cex = 1, bg = colours[char + 1], adj = 1)
#edgelabels()
markChanges(c(4, 7, 15, 18), c('0->1', rep('1->0', 3)))


# Check that this is how our algorithm works with the following lines:
#vignettePlot(tree, toInapp(char), passes=1:2, legend.pos='none')
#vignettePlot(tree, rev(char), na=FALSE, passes=1:2, legend.pos='none')

This approach is also robust to missing entries:

par(mfrow=c(1, 2), mar=rep(0.75, 4))
tree <- read.tree(text = "(F, (E, (D, (C, (B, A)))));")
tree$tip.label <- paste("_", tree$tip.label)
char <- c(0,0, '?', 1, 1, 1)

edge_col <- redBlue[c(1,1,1,rep(2, 11))]

graphics::plot(tree, show.tip.label = TRUE, main='A&M fig. 4a',
               type = "phylogram", use.edge.length = TRUE, cex = 1, adj = 0,
               edge.color = edge_col, edge.width = 2, direction='upwards',
               x.lim = c(0.7200000, length(char) + 0.5),
               y.lim = c(-0.5, length(char) + 0.5))

down_labels <- paste("1:", c(0, 1, '01')[c(1, 3, 2, 2, 2)])
up_labels <- paste("2:",   c(0, 0, 1, 1, 1))
node_labels <- paste(down_labels, up_labels, sep = "\n")

ape::nodelabels(node_labels, cex = 0.9, bg = col.tips.nodes[c(2,3,2,2,2)])

tiplabels(char, cex = 1, bg = colours[c(1,1,3,2,2,2)], adj = 1)
markChanges(4, '0->1')
# Check that this is how our algorithm works with the following lines:
#vignettePlot(tree, c(2,2,2, '?', '-', '-'), passes=1:2, legend.pos='none')
#vignettePlot(tree, rev(char), na=FALSE, passes=1:2, legend.pos='none')

char <- c(0,0,'?', 1, 0, 1)
edge_col <- TreeTools::brewer[[2]][c(1,1,1,2,2, 2, 2, 2, 1, 2)]

graphics::plot(tree, show.tip.label = TRUE, main='A&M fig. 4b',
               type = "phylogram", use.edge.length = TRUE, cex = 1, adj = 0,
               edge.color = edge_col, edge.width = 2, direction='upwards',
               x.lim = c(0.7200000, length(char) + 0.5),
               y.lim = c(-0.5, length(char) + 0.5))

down_labels <- paste("1:", c(0, 1, '01')[c(1,3,2,2,3)])
up_labels <- paste("2:",   c(0, 0, choose1, 1, 1))
node_labels <- paste(down_labels, up_labels, sep = "\n")

ape::nodelabels(node_labels, cex = 0.9, bg = col.tips.nodes[c(2,3,2,2,3)])

tiplabels(char, cex = 1, bg = colours[c(1,1,3,2,1,2)], adj = 1)
markChanges(c(4, 9), c('0->1', rep('1->0', 1)))


# Check that this is how our algorithm works with the following lines:
#vignettePlot(tree, c(2, '-', 2, '?', '-', '-'), passes=1:2, legend.pos='none')
#vignettePlot(tree, rev(char), na=FALSE, passes=1:2, legend.pos='none')

TGuillerme/Inapp documentation built on Feb. 4, 2024, 7:26 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

TGuillerme/Inapp
Reconstruction of Inapplicable Discrete Characters on Phylogenetic Trees

In TGuillerme/Inapp: Reconstruction of Inapplicable Discrete Characters on Phylogenetic Trees

The Fitch algorithm {#fitch}

Downpass

Uppass

Resolving ambiguous resolutions

Maximising homology

R Package Documentation

Browse R Packages

We want your feedback!

TGuillerme/Inapp Reconstruction of Inapplicable Discrete Characters on Phylogenetic Trees

In TGuillerme/Inapp: Reconstruction of Inapplicable Discrete Characters on Phylogenetic Trees

The Fitch algorithm {#fitch}

Downpass

Uppass

Resolving ambiguous resolutions

Maximising homology

R Package Documentation

Browse R Packages

We want your feedback!

TGuillerme/Inapp
Reconstruction of Inapplicable Discrete Characters on Phylogenetic Trees