library('Inapp'); library('ape') setMargins <- function () par(mar=rep(0.2, 4)) markChanges <- function (edges, labels='X') edgelabels(frame='none', edge=edges, labels, font=2) col.tips.nodes = c("#fc8d59", "#eeeeeed0", "#7fbf7be0", "#af8dc3e0") redBlue <- colours <- c(TreeTools::brewer[[2]], 'grey') deltran <- '<span style="font-variant:small-caps;">DelTran</span>' acctran <- '<span style="font-variant:small-caps;">AccTran</span>' choose1 <- "[01]=>1"
This algorithm was proposed by Fitch [-@Fitch1971] and is implemented in phylogenetic software that employs maximum parsimony [@swofford2001paup; @Goloboff2016] and probabilistic methods [@Ronquist2012mrbayes; @Stamatakis2014]. The simple and elegant procedure and entails going down the tree to count the number of transformations, then going back up the tree to finalise the ancestral state reconstructions.
The way the algorithm goes up and down depends on the software, but often employs a traversal: a recursive function that can visit all tips and nodes in a logical fashion. For example, consider the following tree:
setMargins() tree <- ape::read.tree(text = "(((a, b), (c, d)), e);") tree$node.label <- c("n4", "n3", "n1", "n2") plot(tree) nodelabels(tree$node.label)
A downpass traversal will first evaluate the first cherry (or pair of taxa) (A
below), then save the results in n1
and evaluate the second pair of tips/nodes (B
), saving the results in n2
, proceeding until all the nodes and tips have been visited.
setMargins() plot(tree) nodelabels(tree$node.label) ## a a <- c(4, 1.2) ## b b <- c(4, 1.8) ## c c <- c(4, 3.2) ## d d <- c(4, 3.8) ## e e <- c(4, 4.8) ## n2 n2_1 <- c(3.2, 3.5) n2_2 <- c(2.8, 3.3) ## n1 n1_1 <- c(3.2, 1.5) n1_2 <- c(2.8, 1.7) ## n3 n3_1 <- c(1.2, 2.5) n3_2 <- c(0.8, 2.7) ## n4 n4 <- c(0.2, 3.75) ## First cherry arrows(a[1], a[2], n1_1[1], n1_1[2]-0.025, length = 0.05) arrows(b[1], b[2], n1_1[1], n1_1[2]+0.025, length = 0.05) points(3.8, 1.5, cex = 3) text(3.8, 1.5, "A", cex = 1) ## Second cherry arrows(d[1], d[2], n2_1[1], n2_1[2]+0.025, length = 0.05) arrows(c[1], c[2], n2_1[1], n2_1[2]-0.025, length = 0.05) points(3.8, 3.5, cex = 3) text(3.8, 3.5, "B", cex = 1) ## Third arrows(n1_2[1], n1_2[2], n3_1[1], n3_1[2]-0.025, length = 0.05) arrows(n2_2[1], n2_2[2], n3_1[1], n3_1[2]+0.025, length = 0.05) points(2.5, 2.5, cex = 3) text(2.5, 2.5, "C", cex = 1) ## Fourth arrows(n3_2[1], n3_2[2], n4[1], n4[2]-0.025, length = 0.05) arrows(e[1], e[2], n4[1], n4[2]+0.025, length = 0.05) points(0.7, 3.5, cex = 3) text(0.7, 3.5, "D", cex = 1)
An uppass traversal works identically but in the other direction, going from the nodes towards the tips.
setMargins() plot(tree) nodelabels(tree$node.label) ## First cherry arrows(n1_1[1], n1_1[2]-0.025, a[1], a[2], length = 0.05) arrows(n1_1[1], n1_1[2]+0.025, b[1], b[2], length = 0.05) points(3.8, 1.5, cex = 3) text(3.8, 1.5, "D", cex = 1) ## Second cherry arrows(n2_1[1], n2_1[2]+0.025, d[1], d[2], length = 0.05) arrows(n2_1[1], n2_1[2]-0.025, c[1], c[2], length = 0.05) points(3.8, 3.5, cex = 3) text(3.8, 3.5, "C", cex = 1) ## Third arrows(n3_1[1], n3_1[2]-0.025, n1_2[1], n1_2[2], length = 0.05) arrows(n3_1[1], n3_1[2]+0.025, n2_2[1], n2_2[2], length = 0.05) points(2.5, 2.5, cex = 3) text(2.5, 2.5, "B", cex = 1) ## Fourth arrows(n4[1], n4[2]-0.025, n3_2[1], n3_2[2], length = 0.05) arrows(n4[1], n4[2]+0.025, e[1], e[2], length = 0.05) points(0.7, 3.5, cex = 3) text(0.7, 3.5, "A", cex = 1)
In both traversals, the sequence in which nodes are visited (i.e. which cherry to pick first in a downpass traversal, or whether to continue left or right in an uppass traversal) is arbitrary, provided that all tips and nodes are eventually visited.
Now let's consider a more complex tree ((((a, b), c), d), (e, (f, (g, h))));
with a binary character distributed ((((1, 0), 0), 1), (1, (0, (0, 1))))
:
tree <- ape::read.tree(text = "((((a, b), c), d), (e, (f, (g, h))));") char <- c(1,0,0,1,1,0,0,1) ## Plotting the tree and characters setMargins() plot(tree, adj = 0.5) tiplabels(char, cex = 1, bg = TreeTools::brewer[[2]][char + 1], adj = 1) nodelabels(paste0("n", 9:15), bg = "bisque")
We can use the Inapp
package to apply the Fitch algorithm for this character on this tree.
## Loading the Inapp package library(Inapp) ## The tree tree <- read.tree(text = "((((a, b), c), d), (e, (f, (g, h))));") ## The character character <- "10011001" ## Applying the Fitch algorithm matrix <- apply.reconstruction(tree, character, method = "Fitch", passes = 2)
The downpass is quite simple and follows these rules for the two possible cases in the traversal [@Fitch1971]:
For example, node n15
is case 2 because there is nothing in common between the tips h
and g
(states 1
and 0
respectively). Node n14
is case 1 because there is at least one state in common between the tip f
and the node n15
(state 0
).
Important: when the case 2 is encountered, a transformation is implied in the descendants of the considered node. The score of the tree is incremented by +1.
In the following example, nodes that are in case 1 are in white and nodes in case 2, which imply a transformation (i.e. that adds to the tree score), are in green.
## Plotting the first downpass plot.states.matrix(matrix, passes = 1, counts = 2, show.labels = c(1,2)) tiplabels(char, cex = 1, bg = TreeTools::brewer[[2]][char + 1], adj = 1)
The score of the tree is known after the downpass, but the states of some nodes might not yet be properly resolved.
For example, parsimonious reconstructions exist that reconstruct node n14
as state 1
, as opposed to the 0
presently reconstructed.
The present reconstruction seemingly indicates a change from state 1 to state 0 in an ancestor of n14
, and a subsequent change from state 0 to state 1 in the ancestor of h
, but an alternative reconstruction is equally parsimonious: n13
, n14
and n15
may all have state 1
, with a change from state 1
to state 0
in each of the two lineages leading to f
and g
.
The uppass traversal employs the following rules [@Fitch1971]:
For example, node n13
is already resolved (it has all its states in common with the ancestor - case 1).
Node n14
has not all its states in common with its ancestor but its two descendants (n15
and f
) have at least one state in common (0
).
This node is thus solved to be the states in common between both descendants (01
) and its ancestor (01
as well).
## Plotting the first uppass plot.states.matrix(matrix, passes = 1:2, counts = 2, show.labels = c(1, 2), col.states=TRUE)
More complex cases can be studied in the Inapp App (running in your favourite web browser) by switching the Reconstruction method
to Normal Fitch
.
## Running the Inapp App runInapp()
In certain cases, it is necessary to go further and discriminate between the equally-parsimonious reconstructions provided by the Fitch algorithm. A number of approaches have been proposed concerning which resolution of ambiguous nodes is preferable.
The two most familiar approaches to resolving ambiguous node are the Accelerated Transformation (r acctran
) and Delayed Transformation (r deltran
) approaches
[@Farris1970;@Swofford1987].
The r acctran
approach reconstructs transformations as occurring as close to the root as possible; the r deltran
, as far from the root as possible.
In this case, the ambiguous resolution of the root leaves two options for the latter:
(ref:fig-cap-trans) Node reconstructions after r acctran
or r deltran
optimizations
par(mfrow=c(2, 2), mar=rep(1, 4)) tree <- read.tree(text = "((((a, b), c), d), (e, (f, (g, h))));") tree$tip.label <- paste("_", tree$tip.label) char <- c(1,0,0,1,1,0,0,1) edge_col <- TreeTools::brewer[[2]][c(2,2,1, 2,1,1,2, 2,2,2, 1,1,1,2)] graphics::plot(tree, show.tip.label = TRUE, main='AccTran', type = "phylogram", use.edge.length = TRUE, cex = 1, adj = 0, edge.color = edge_col, edge.width = 2, x.lim=c(-0.5, 8), y.lim=c(0.5, 8.5)) node_labels <- paste(paste("Acc:", sep = ""), c(1,1,0,0,1,0,0)) node_labels <- paste(paste("n", 9:15, "\n", sep = ""), node_labels, sep = "") ape::nodelabels(node_labels, cex = 0.9, bg = col.tips.nodes[c(2,3,2,3,3,2,3)]) tiplabels(char, cex = 1, bg = colours[char + 1], adj = 1) markChanges(c(2, 4, 10, 14)) edge_col <- redBlue[c(2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2)] graphics::plot(tree, show.tip.label = TRUE, main='DelTran', type = "phylogram", use.edge.length = TRUE, cex = 1, adj = 0, edge.color = edge_col, edge.width = 2, x.lim=c(-0.5, 8), y.lim=c(0.5, 8.5)) node_labels <- paste(paste("Del:", sep = ""), rep(1, 7)) node_labels <- paste(paste("n", 9:15, "\n", sep = ""), node_labels, sep = "") ape::nodelabels(node_labels, cex = 0.9, bg = col.tips.nodes[c(2,3,2,3,3,2,3)]) tiplabels(char, cex = 1, bg = colours[char + 1], adj = 1) markChanges(c(5, 6, 11, 13)) plot(1, type="n", axes=F, xlab="", ylab="") #edge_col <- TreeTools::brewer[[2]][3-c(2,2,1,1, 1,1,2, # 2,2,2,1, 1,1,2)] # #graphics::plot(tree, show.tip.label = TRUE, main='AccTran', # type = "phylogram", use.edge.length = TRUE, cex = 1, adj = 0, # edge.color = edge_col, edge.width = 2) # #node_labels <- paste(paste("Acc:", sep = ""), 1-c(1,1,0,0,1,0,0)) #node_labels <- paste(paste("n", 9:15, "\n", sep = ""), node_labels, sep = "") # #ape::nodelabels(node_labels, cex = 0.9, bg = col.tips.nodes[c(2,3,2,3,3,2,3)]) # #tiplabels(char, cex = 1, bg = colours[char + 1], adj = 1) #edgelabels(frame='none', edge=c(2, 4, 10, 14), "X", col=col.tips.nodes[3]) edge_col <- redBlue[c(1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 1, 1, 2)] graphics::plot(tree, show.tip.label = TRUE, main='Acc/DelTran', type = "phylogram", use.edge.length = TRUE, cex = 1, adj = 0, edge.color = edge_col, edge.width = 2, x.lim=c(-0.5, 8), y.lim=c(0.5, 8.5)) node_labels <- paste(paste("Del:", sep = ""), rep(0, 7)) node_labels <- paste(paste("n", 9:15, "\n", sep = ""), node_labels, sep = "") ape::nodelabels(node_labels, cex = 0.9, bg = col.tips.nodes[c(2,3,2,3,3,2,3)]) tiplabels(char, cex = 1, bg = colours[char + 1], adj = 1) markChanges(c(4, 7, 9, 14))
If the states 0
and 1
represent states of a transformational character --
whether an organism's tail is red or blue, say -- then there is no reason to prefer
any of the equally-parsimonious reconstructions, as none implies any more homology
than any other.
With neomorphic characters, however, state 0
stands for the absence of a character
-- for example, a tail --
and state 1
its presence. On one view, a reconstruction that minimises the number
of times that such a character evolves attributes more similarity to homology than
an equally parsimonious reconstruction in which said character is gained multiple times
independently.
Neither r acctran
nor r deltran
is guaranteed to maximise homology [@Agnarsson2008].
In this particular case, the r deltran
reconstruction maximises homology. If the
character denotes the presence or absence of a tail, then this reconstruction invokes the presence of a tail in the common ancestor of all taxa, meaning that the tails present in
tips a
, d
, e
and h
are homologous with one another.
The r acctran
reconstruction, in contrast, identifies a loss of a tail
at nodes 11 and 14, with a tail evolving independently in tips a
and h
.
Under this reconstruction, the tails of a
and h
are not homologous with each other, or with the tails of d
and e
.
(The alternative r deltran
approach, which could arguably be described as r acctran
instead, invokes four independent origins of the character and clearly does not maximise its homology.)
Where we wish to maximise homology, we modify the Fitch uppass such that any node whose final state reconstruction would be ambiguous is instead reconstructed as present when that node is encountered. This approach maximises homology in the problematic trees presented by Agnarsson & Miller [-@Agnarsson2008; "A&M" below]:
par(mfrow=c(1, 2), mar=rep(0.75, 4)) tree <- read.tree(text = "(H, (G, (F, (E, (D, (C, (B, A)))))));") tree$tip.label <- paste("_", tree$tip.label) char <- c(0,0,1,1,1,0,1,0) edge_col <- TreeTools::brewer[[2]][c(1,1,1,rep(2, 7), 1, 2, 2, 1)] graphics::plot(tree, show.tip.label = TRUE, main='A&M fig. 2', type = "phylogram", use.edge.length = TRUE, cex = 1, adj = 0, edge.color = edge_col, edge.width = 2, direction='upwards', x.lim = c(0.7200000, length(char) + 0.5), y.lim = c(-0.5, length(char) + 0.5)) down_labels <- paste("1:", c(0, 1, '01')[c(1, 3, 2, 2, 3, 1, 3)]) up_labels <- paste("2:", c(0,0,1,1,1,choose1,1)) # n14 informs n15 #node_labels <- paste(paste("n", 9:15, sep = ""), down_labels, up_labels, sep = "\n") node_labels <- paste(down_labels, up_labels, sep = "\n") ape::nodelabels(node_labels, cex = 0.9, bg = col.tips.nodes[c(2,3,2,2,3,2,3)]) tiplabels(char, cex = 1, bg = colours[char + 1], adj = 1) markChanges(c(4, 11, 14), c('0->1', '1->0', '1->0')) # Check that this is how our algorithm works with the following lines: #toInapp <- function (x) rev(c('-', '0')[x + 1]) #vignettePlot(tree, toInapp(char), passes=1:2, legend.pos='none') #vignettePlot(tree, rev(char), na=FALSE, passes=1:2, legend.pos='none') tree <- read.tree(text = "(J, (I, (H, (G, (F, (E, (D, (C, (B, A)))))))));") tree$tip.label <- paste("_", tree$tip.label) char <- c(0,0,1,0,1,1,1,0,1,0) edge_col <- TreeTools::brewer[[2]][c(1,1,1,2, 2, 2,1,2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1)] graphics::plot(tree, show.tip.label = TRUE, main='A&M fig. 3', type = "phylogram", use.edge.length = TRUE, cex = 1, adj = 0, edge.color = edge_col, edge.width = 2, direction='upwards', x.lim = c(0.7200000, length(char) + 0.5), y.lim = c(-0.5, length(char) + 0.5)) down_labels <- paste("1:", c(0, 1, '01')[c(1,3,2,3,2,2,3,1,3)]) up_labels <- paste("2:", c(0,0, choose1, rep(1, length(char) - 6), choose1, 1)) #node_labels <- paste(paste("n", 9:15, sep = ""), down_labels, up_labels, sep = "\n") node_labels <- paste(down_labels, up_labels, sep = "\n") ape::nodelabels(node_labels, cex = 0.9, bg = col.tips.nodes[c(2,3,2,3,2,2,3,2,3)]) tiplabels(char, cex = 1, bg = colours[char + 1], adj = 1) #edgelabels() markChanges(c(4, 7, 15, 18), c('0->1', rep('1->0', 3))) # Check that this is how our algorithm works with the following lines: #vignettePlot(tree, toInapp(char), passes=1:2, legend.pos='none') #vignettePlot(tree, rev(char), na=FALSE, passes=1:2, legend.pos='none')
This approach is also robust to missing entries:
par(mfrow=c(1, 2), mar=rep(0.75, 4)) tree <- read.tree(text = "(F, (E, (D, (C, (B, A)))));") tree$tip.label <- paste("_", tree$tip.label) char <- c(0,0, '?', 1, 1, 1) edge_col <- redBlue[c(1,1,1,rep(2, 11))] graphics::plot(tree, show.tip.label = TRUE, main='A&M fig. 4a', type = "phylogram", use.edge.length = TRUE, cex = 1, adj = 0, edge.color = edge_col, edge.width = 2, direction='upwards', x.lim = c(0.7200000, length(char) + 0.5), y.lim = c(-0.5, length(char) + 0.5)) down_labels <- paste("1:", c(0, 1, '01')[c(1, 3, 2, 2, 2)]) up_labels <- paste("2:", c(0, 0, 1, 1, 1)) node_labels <- paste(down_labels, up_labels, sep = "\n") ape::nodelabels(node_labels, cex = 0.9, bg = col.tips.nodes[c(2,3,2,2,2)]) tiplabels(char, cex = 1, bg = colours[c(1,1,3,2,2,2)], adj = 1) markChanges(4, '0->1') # Check that this is how our algorithm works with the following lines: #vignettePlot(tree, c(2,2,2, '?', '-', '-'), passes=1:2, legend.pos='none') #vignettePlot(tree, rev(char), na=FALSE, passes=1:2, legend.pos='none') char <- c(0,0,'?', 1, 0, 1) edge_col <- TreeTools::brewer[[2]][c(1,1,1,2,2, 2, 2, 2, 1, 2)] graphics::plot(tree, show.tip.label = TRUE, main='A&M fig. 4b', type = "phylogram", use.edge.length = TRUE, cex = 1, adj = 0, edge.color = edge_col, edge.width = 2, direction='upwards', x.lim = c(0.7200000, length(char) + 0.5), y.lim = c(-0.5, length(char) + 0.5)) down_labels <- paste("1:", c(0, 1, '01')[c(1,3,2,2,3)]) up_labels <- paste("2:", c(0, 0, choose1, 1, 1)) node_labels <- paste(down_labels, up_labels, sep = "\n") ape::nodelabels(node_labels, cex = 0.9, bg = col.tips.nodes[c(2,3,2,2,3)]) tiplabels(char, cex = 1, bg = colours[c(1,1,3,2,1,2)], adj = 1) markChanges(c(4, 9), c('0->1', rep('1->0', 1))) # Check that this is how our algorithm works with the following lines: #vignettePlot(tree, c(2, '-', 2, '?', '-', '-'), passes=1:2, legend.pos='none') #vignettePlot(tree, rev(char), na=FALSE, passes=1:2, legend.pos='none')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.