Global optimization {#global}

Global optima may be locally suboptimal

Our algorithm only considers parsimonious reconstructions of principal characters. Jan De Laet (personal communication) has identified situations where a locally non-parsimonious reconstruction of the principal character minimizes global homoplasy.

Consider the following:

tail <- "11000011"
tailCodings <- strsplit(tail, '')[[1]]
clr  <- "11----11"
ontd <- 5
tips <- nchar(tail)
losses <- sum(tailCodings == 0)
bifH <- read.tree(text="((((a, b), c), d), (e, (f, (g, h))));")
mDep <- matrix(strsplit(paste0(c(tail, tail, rep(clr, ontd)), collapse=''), '')[[1]], byrow=TRUE, nrow=2 + ontd)
colnames(mDep) <- LETTERS[1:ncol(mDep)]
rownames(mDep) <- c("Tail: (0), absent; (1), present.",
                    "Beak: (0), absent; (1), present.",
                    "Tail, colour: (1), red; (2), blue.",
                    "Tail, length: (1), long; (2), short.",
                    "Tail, rigidity: (1), rigid; (2), flexible.",
                    "Tail, curvature: (1), convex; (2), concave.",
                    "Tail, lustre: (1), glossy; (2), matt.")
knitr::kable(mDep, caption="Coding")

On a tree where two tail-bearing clades (AB & GH) are separated by four taxa without tails (C, D, E & F), our algorithm reconstructs two separate origins of the tail:

goloPar()

vignettePlot(bifH, tail, legend.pos='topleft', FALSE, passes=0,
             main='Tail (or beak)', state.labels = ap)
nodelabels(pch=16, cex=6, col=c(whitey, whitey, greeny, whitey, whitey, greeny, whitey))
nodelabels(c(0, 0, 0, 1, 0, 0, 1), cex=1.6, frame='none')

tiplabels(cex=1.6, frame='none', LETTERS[1:8], adj=c(2.5, 0.5))
vignettePlot(bifH, clr, legend.pos='topleft', passes=0,
             main='Tail colour (etc.)', state.labels=c('', 'Red'))
nodelabels(pch=16, cex=6, col=c(purply, rep(whitey, 6)))
nodelabels(c('-', '-', 1, '-', '-', '-', 1), cex=1.6, frame='none')

This reconstruction implies r number(2 + ontd) homoplasies: one independent gain of the tail, one of the beak, and one independent origin of each ontologically dependent property (redness, longness, rigidity, convexness, glossiness).

An alternative is to reconstruct the tail as ancestrally present, and lost independently in C, D, E and F.

goloPar()
nodeStates <- rep(1, nchar(tail) - 1)
myStates <- c(tailCodings, as.list(nodeStates))
vignettePlot(bifH, tail, legend.pos='topleft', FALSE, passes=0,
             main='Tail', state.labels = ap, state.override=myStates,
             changes.override = integer(losses))
nodelabels(pch=16, cex=6, col=c(whitey, greeny, greeny, whitey, greeny, greeny, whitey))
nodelabels(nodeStates, cex=1.6, frame='none')

tiplabels(cex=1.6, frame='none', LETTERS[1:8], adj=c(2.5, 0.5))
myStates <- c(strsplit(clr, '')[[1]], as.list(nodeStates))
vignettePlot(bifH, clr, legend.pos='topleft', passes=0,
             main='Tail colour (etc.)', state.labels=orb,
             state.override=myStates, regions.override = integer(0),
             changes.override = integer(0))
nodelabels(pch=16, cex=6, col=whitey)
nodelabels(nodeStates, cex=1.6, frame='none')

Considering only the tail, this is an unparsimonious reconstruction: it requires r number(losses) independent evolutionary events (losses), whereas the former required only two independent evolutionary events (gains). Globally, however, this allows the similarity between ontologically dependent characters to be attributed to common ancestry (i.e. homology), resulting in a lower overall score of r number(losses + 2) homoplasies (the r number(losses) in the tail, plus the two in the beak, reconstructed as before, but none in any ontologically dependent character). As such, this latter reconstruction attains the lowest overall score and should be considered the least homoplasious.

Ontology

Note that the details of this reconstruction rely on the attribution of ontologically dependent characters to specific principal characters. Changing the ontology of the previous matrix (without modifying the scorings) such that two ontologically dependent characters depend on the beak, rather than the tail, results in a different outcome:

rownames(mDep)[nrow(mDep) - 1:0] <- c("**Beak**, curvature: (1), convex; (2), concave.",
                    "**Beak**, lustre: (1), glossy; (2), matt.")
knitr::kable(mDep, caption="Tail: Cost to go from left state to top state:")

Now, neither principal character has enough ontologically dependent characters to compensate for the additional cost of unparsimoniously reconstructing its own distribution. Following the individually parsimonious reconstruction of the tail and beak entails a minimum of r number(1 + 1 + ontd) homoplasies: one additional gain of each of the beak and the tail, and an independent origin of redness, longness, rigidity, curvature and lustre:

goloPar()

vignettePlot(bifH, tail, legend.pos='topleft', FALSE, passes=0,
             main='Tail (or beak)', state.labels = ap)
nodelabels(pch=16, cex=6, col=c(whitey, whitey, greeny, whitey, whitey, greeny, whitey))
nodelabels(c(0, 0, 0, 1, 0, 0, 1), cex=1.6, frame='none')

tiplabels(cex=1.6, frame='none', LETTERS[1:8], adj=c(2.5, 0.5))
vignettePlot(bifH, clr, legend.pos='topleft', passes=0,
             main='Tail colour (etc.)', state.labels=orb)
nodelabels(pch=16, cex=6, col=c(purply, rep(whitey, 6)))
nodelabels(c('-', '-', 1, '-', '-', '-', 1), cex=1.6, frame='none')

On the other hand, reconstructing a beak and a tail as present in the common ancestor requires four independent losses in each character -- a total of r number(2 * losses) homoplasies, which does not outweigh the benefit obtained by reconstructing the ontologically dependent characters as homologous:

goloPar()
nodeStates <- rep(1, nchar(tail) - 1)
myStates <- c(strsplit(tail, '')[[1]], as.list(nodeStates))
vignettePlot(bifH, tail, legend.pos='topleft', FALSE, passes=0,
             main='Tail AND beak', state.labels = ap, state.override=myStates,
             changes.override = integer(losses))
nodelabels(pch=16, cex=6, col=c(whitey, greeny, greeny, whitey, greeny, greeny, whitey))
nodelabels(nodeStates, cex=1.6, frame='none')

tiplabels(cex=1.6, frame='none', LETTERS[1:8], adj=c(2.5, 0.5))
myStates <- c(strsplit(clr, '')[[1]], as.list(nodeStates))
vignettePlot(bifH, clr, legend.pos='topleft', passes=0,
             main='Tail colour (etc.)', state.labels=orb,
             state.override=myStates, regions.override = integer(0),
             changes.override = integer(0))
nodelabels(pch=16, cex=6, col=whitey)
nodelabels(nodeStates, cex=1.6, frame='none')

Our algorithm does not always select the tree that minimizes global homology as the optimal tree. Doing so requires the explicit specification of character ontologies, and is thus not possible from character state data alone.

Similarity due to chance

It is important to recall that two features that evolve independently are expected to share a number of similarities due to chance. A curved beak, for example, must be either convex or concave, and must be either glossy or matt. If curved beaks evolved twice, then there is at least a $\frac{1}{4}$ chance that the two innovations will have the same lustre and direction of curvature. (In practice, fitness, developmental constraints and contingency are likely to make one curvature or lustre more likely, increasing the likelihood of a fluke similarity.)

If beaks did evolve twice, however, then "beak" should not be used as a character: character statements are theories that a thing (here a beak) is homologous in all the taxa in which the thing is observed [@Platnick1979]; if this theory is false, then comparisons between attributes of the thing are invalid.

For this reason, parsimony methods have no satisfactory means of attributing chance similarity in attributes of non-homologous things to coincidence rather than common descent. If several ontologically dependent binary transformation series are established to reflect attributes of a character that, in truth, denotes a thing with two independent origins, the a number of similarities in the attributes are expected at each origin of the "character" by coincidence (rather than by common descent):

x <- 0:20
p <- 0.5
E <- function (x, p) x * ((p * p) + ((1-p) * (1-p)))
plot(E(x, 0.5) ~ x, pch='.', col = '#ffffff00', xlim = c(1, 17),
     xlab='Number of ontologically dependent transformation series',
     ylab='Expected similarities due to chance')
lines(E(x, 0.5) ~ x, col = TreeTools::brewer[[3]][1])
lines(E(x, 2/3) ~ x, col = TreeTools::brewer[[3]][2])
lines(E(x, 0.1) ~ x, col = TreeTools::brewer[[3]][3])
legend('bottomright', bty = 'n', col = TreeTools::brewer[[3]], 
       lty = 'solid', 
       c('Both states equally probable', 'One state twice as likely',
         'One state 90% probable')
       )

In the conservative case that binary attributes are equally probable, two independent origins of a character with 12 ontologically dependent binary transformation series are expected to exhibit, by chance, six attributes in common.

An independent origin of the character, however, violates the theory of homology implicit in the definition of both the character [@Platnick1979] and the transformation series that describe its attributes. The differences between algorithms that minimise global homoplasy and our own reflect different approaches to tackling this violation. There is a tension between two extremes: (i), taking similarities that can be attributed to coincidence as evidence for homology in the parent character; and (ii) being quick to attribute genuine homologies to independent gains of the parent character. The approach of maximising homology lies closer to the first extreme; our approach lies closer to the second.

As the number of subcharacters increases, the divergence between our approaches becomes more manifest. Whether a character has 6, 12 or 120 ontologically dependent characters, if six of these characters bear the same state, then global homology will be maximised by reconstructing seven independent losses, in order to attribute these similarities to common ancestry -- even though in the latter cases the similarities are likely due to chance, and the additional losses of the character implied by maximising global homoplasy are unlikely to represent evolutionary history.

Implied weighting

This matter is exaggerated further in the context of implied weighting [@Goloboff1993;@Goloboff2014].

Consider the case of a pollinator's tongue.

Tongue: Curvature: Straight / curved

Tongue: Curvature: Direction: Up / down

Tongue: Curvature: Uniformity: Uniform / uneven

Let's assume that two taxa within an analysis have tongues that both curve uniformly up; other tongue-bearing taxa have straight tongues. In the absence of prior knowledge concerning the likely nature of tongue coiling, the probability that two tongues that evolved independently would both curve uniformly upwards is ¼. As such, the similarities between the coiling do not constitute strong evidence that coiling evolved once; two origins is less parsimonious, but not by much.

Let's consider now some trees where the two curled-tongued taxa are separated by a number of straight-tongued taxa:

goloPar()
par(mfrow=c(2, 3))
tongue <- "110000000"
sep0 <- read.tree(text="(z, (i, (h, (g, (f, (e, (d, (c, a))))))));")
sep1 <- read.tree(text="(z, (i, (h, (g, (f, (e, (b, (c, a))))))));")
sep2 <- read.tree(text="(z, (i, (h, (g, (f, (b, (d, (c, a))))))));")
sep3 <- read.tree(text="(z, (i, (h, (g, (b, (e, (d, (c, a))))))));")
sep4 <- read.tree(text="(z, (i, (h, (b, (f, (e, (d, (c, a))))))));")
sep5 <- read.tree(text="(z, (i, (b, (g, (f, (e, (d, (c, a))))))));")
vignettePlot(sep0, tongue, legend.pos='none', FALSE, passes=0, counts=0,
             main='No intercalated taxa',
             state.labels = c('Tongue straight', 'Tongue curved'))
vignettePlot(sep1, tongue, legend.pos='none', FALSE, passes=0, counts=0,
             main='One intercalated taxon', state.labels=NA)
vignettePlot(sep2, tongue, legend.pos='none', FALSE, passes=0, counts=0,
             main='Two intercalated taxa', state.labels=NA,
             state.override=c(0,0,0,0,0,1,0,0,1, 0, 0, 0, 0, 0, 2, 2, 2))
vignettePlot(sep3, tongue, legend.pos='none', FALSE, passes=0, counts=0,
             main='Three intercalated taxa', state.labels=NA,
             state.override=c(0,0,0,0,1,0,0,0,1, 0, 0, 0, 0, 2, 2, 2, 2))
vignettePlot(sep4, tongue, legend.pos='none', FALSE, passes=0, counts=0,
             main='Four intercalated taxa', state.labels = NA,
             state.override=c(0,0,0,1,0,0,0,0,1, 0, 0, 0, 2, 2, 2, 2, 2))
vignettePlot(sep5, tongue, legend.pos='none', FALSE, passes=0, counts=0,
             main='Five intercalated taxa', state.labels = NA,
             state.override=c(0,0,1,0,0,0,0,0,1, 0, 0, 2, 2, 2, 2, 2, 2))

Each of these trees can be interpreted in one of two ways: there may have been two independent evolutionary events that gave rise to curved tongues (which both happened to curve uniformly upwards, by a small but unremarkable coincidence), or there was one evolutionary event that gave rise to a curved tongue, and 0, 1, 2, 3, 4, or 5 additional evolutionary events whereby a curved tongue was straightened.

Let's consider the extra steps entailed for each tree under these two scenarios:

ColouredText <- function(text, colour) {
  if(knitr::is_latex_output())
    paste0("\\textcolor{", colour, "}{", text, "}")
  else if(knitr::is_html_output())
    paste0("<font color='", colour, "'>", text, "</font>")
  else
    text
}
bad <- TreeTools::brewer[[11]][2]
good <- TreeTools::brewer[[11]][9]

Table <- function (input, caption, greens=0) {
    cells <-  matrix(input, byrow = TRUE, ncol=nInter)
    colours <- if (greens < 0) c(rep(bad, nInter + greens), rep(good, -greens)) else
        c(rep(good, greens), rep(bad, nInter - greens))
    cells <- rbind(signif(cells, 3),
                   paste0('**', ColouredText(signif(colSums(cells), 3), colours), '**'))
    colnames(cells) <- seq_len(nInter) - 1
    cells <- cbind('Intervening taxa' = c('Curvature', 'Direction', 'Uniformity', '**Total**'), cells)
    knitr::kable(cells, caption=caption)
}
Fit <- function (e, k) e / (e + k)
nInter <- 6
parent_extra_1 <- function() seq_len(nInter) - 1
subchar_homoplasies_1 <- function() rep(0, nInter)
Table(c(parent_extra_1(), rep(subchar_homoplasies_1(), 2)), caption="One origin, many losses", 4)
parent_extra_2 <- function() rep(1, nInter)
subchar_homoplasies_2 <- function() rep(1, nInter)
Table(c(parent_extra_2(), rep(subchar_homoplasies_2(), 2)), caption="Two origins, no losses", -3)

On this view, if there are fewer than three intervening taxa, then it is more homologous to reconstruct a single origin and zero, one or two losses; if there are more than three intervening taxa, it is more homologous to reconstruct two separate origins of curvature.

However, under implied weights [@Goloboff1993], each additional homoplasy in a character is afforded less cost than the one before, according to a 'goodness of fit' function $fit = \frac{e}{e + k}$. The preferable reconstruction now is the one that minimises total cost, which will depend on the value of k selected. Under any value of of k, it takes at least four intervening taxa for two origins of the tail to be preferable to multiple losses:

k = 10

k <- 10
Table(c(Fit(parent_extra_1(), k), rep(Fit(subchar_homoplasies_1(), k), 2)),
      caption=paste("One origin, many losses; _k_ =", k), 4)
Table(c(Fit(parent_extra_2(), k), rep(Fit(subchar_homoplasies_2(), k), 2)),
      caption=paste("Two origins, no losses; _k_ =", k), -2)

And at smaller values of k, progressively more losses of the tail are preferable to a single coincidence:

k = 5

k <- 5
Table(c(Fit(parent_extra_1(), k), rep(Fit(subchar_homoplasies_1(), k), 2)),
      caption=paste("One origin, many losses; _k_ =", k), 6)
Table(c(Fit(parent_extra_2(), k), rep(Fit(subchar_homoplasies_2(), k), 2)),
      caption=paste("Two origins, no losses; _k_ =", k), -1)

k = 3

k <- 3
nInter <- 11
Table(c(Fit(parent_extra_1(), k), rep(Fit(subchar_homoplasies_1(), k), 2)),
      caption=paste("One origin, many losses; _k_ =", k), 10)
Table(c(Fit(parent_extra_2(), k), rep(Fit(subchar_homoplasies_2(), k), 2)),
      caption=paste("Two origins, no losses; _k_ =", k), -2)

Generalization

In general, if there are even a small number of similarities between ontologically dependent characters, then the maximal-fit reconstruction will infer a very high number of secondary losses rather than attributing a small number of similarities to convergence. This is especially true at small to medium values of k.

k <- 1+1.05^(0:100)
plot((3 * k) / (k - 2) ~ k, log='x', pch='.', ylab='Intervening taxa', col='#ffffff00', ylim=c(1, 15), xlim=c(2, 160))
brew7 <- TreeTools::brewer[[7]]
lines((2 * k) / (k - 2) ~ k, col = brew7[1])
lines((3 * k) / (k - 2) ~ k, col = brew7[2])
lines((4 * k) / (k - 2) ~ k, col = brew7[3])
lines((5 * k) / (k - 2) ~ k, col = brew7[4])
lines((6 * k) / (k - 2) ~ k, col = brew7[6])
lines((7 * k) / (k - 2) ~ k, col = brew7[7])

text(156, 2, '1', col = brew7[1], adj = c(1, 0.3))
text(156, 3, '2', col = brew7[2], adj = c(1, 0.3))
text(156, 4, '3', col = brew7[3], adj = c(1, 0.3))
text(156, 5, '4', col = brew7[4], adj = c(1, 0.3))
text(156, 6, '5', col = brew7[6], adj = c(1, 0.3))
text(156, 7, '6', col = brew7[7], adj = c(1, 0.3))
text(110, 8, 'Ontologically\ndependent\nsimilarities:', pos=3)
text(2, 1.3, 'Single origin (multiple losses)', adj=0, font=3)
text(20, 13, 'Two origins (no losses)', adj=0, font=3)

This behaviour seems undesirable, even if it has a more secure theoretical underpinning than our own algorithm.

Ultimately, the issue is that a “step-counting” approach is a useful heuristic for a deeper quantity that evaluates the characters themselves as well as their fit to a particular tree; characters are hypotheses of homology just as trees are hypotheses of relationship, and the ultimate quantity of interest corresponds to a likelihood that observed similarities correspond to common descent rather than chance. A balanced approach to this issue requires the application of conditional probabilistic approaches, which to our knowledge have not yet been applied in a parsimony context.



TGuillerme/Inapp documentation built on Feb. 4, 2024, 7:26 a.m.