A common application of tree space analysis is to compare the outputs of different analyses – for instance, trees obtained from different gene sequences, or results obtained using different models or methods (e.g. Bayesian, maximum likelihood, or parsimony).
This can be accomplished quickly using the MapTrees()
graphical user interface:
Load trees from file: Select first tree file
Select an appropriate sample size
Select Replace existing
Load each additional set of trees from file using Add batch to existing
{ width="245px" }
{ width=100% }
More control over the mapping can be obtained at the command line:
# Load trees library("TreeTools", quietly = TRUE) batch1 <- as.phylo(1:60, 8) # Generate 60 similar trees batch2 <- as.phylo(seq(200, 800, length.out = 30), 8) # A separate batch of 30 trees styles <- c(1, 2) # Select plotting colours / symbols treeStyle <- rep(styles, c(length(batch1), length(batch2))) # Calculate distances library("TreeDist") distances <- ClusteringInfoDistance(c(batch1, batch2)) # Construct over-simple 2D PCoA mapping mapping <- cmdscale(distances, k = 2)
# Plot mapping par(mar = rep(0, 4)) plot(mapping, asp = 1, # Preserve aspect ratio - do not distort distances ann = FALSE, axes = FALSE, # Don't label axes: dimensions are meaningless col = treeStyle, # Colour pch = treeStyle # Plotting symbol ) legend("left", c("Batch 1", "Batch 2"), col = styles, pch = styles)
For more robust analyses than the (potentially misleading!) 2D plot above, consult the companion vignette. Note also that mapped areas and their regions of overlap may not correspond to reality; see the warnings and recommendations in @SmithSpace.
Interpreting and comparing the areas of tree space from a projection can be misleading -- the expanded apparent area of Greenland under the Mercator projection being a familiar example.
As such, it is always best to work with original distances when interpreting whether sets of trees occupy larger or smaller regions of tree space.
One approach is to plot distances from a median tree:
# Calculate median trees median1 <- median(batch1) median2 <- median(batch2) # Compute distance from each tree to the median of its batch dist1 <- ClusteringInfoDist(batch1, median1) dist2 <- ClusteringInfoDist(batch2, median2) # Set resolution of histogram nBreaks <- 10 breaks <- seq(0, max(dist1, dist2), length.out = nBreaks) # Plot first distance set hist(dist1, col = "#00000022", breaks = breaks, main = "Distance from median of batch", xlab = "Clustering information distance", ylim = c(0, 25) # Omit this line to infer Y axis limit from first batch. ) # Add second distance set hist(dist2, col = "#ff000022", breaks = breaks, add = TRUE) # Add legend legend("topleft", c("Batch 1", "Batch 2"), fill = c("#00000022", "#ff000022"))
In the plotted example, distances to the median tree are greater for batch 2 than batch 1, indicating a more dispersed set of trees that occupies a greater hypervolume. Note that the increased frequency at higher distances is expected: the outer shell of a sphere contains more volume than a layer of equivalent thickness closer to the centre, and this phenomenon becomes more pronounced as the dimensionality of tree space increases.
A complementary approach is to identify the resolution of the consensus of each batch of trees. This approach shares many of the problems with the Robinson--Foulds distance: in particular, resolution can be decimated by a single "rogue" taxon whose position is poorly defined [@SmithRogue]. Detecting and removing rogue taxa can provide a more meaningful point of comparison.
# Create tree set with a rogue taxon batch3 <- AddTipEverywhere(as.phylo(7, 7), "t8") # Set up plotting area par(mfrow = c(2, 2), mar = rep(0.4, 4)) # Plot naive strict consensus plot(consensus(batch1, p = 1)) plot(consensus(batch3, p = 1)) if (requireNamespace("Rogue", quietly = TRUE)) { cons1 <- ConsensusWithout(batch1, p = 1, Rogue::QuickRogue(batch1, p = 1)[-1, "taxon"]) cons3 <- ConsensusWithout(batch3, p = 1, Rogue::QuickRogue(batch3, p = 1)[-1, "taxon"]) # The information content of each tree gives a measure of its resolution, # accounting for omitted rogue leaves SplitwiseInfo(cons1) # 8.5 bits SplitwiseInfo(cons3) # 15.1 bits: higher resolution indicates that these # trees are more similar, notwithstanding rogue taxa. # Plot the trees plot(cons1) plot(cons3) } else { message("The package 'Rogue' is required to run this example.") }
Whereas a direct interpretation of this analysis is not straightforward, it can provide a complementary way of understanding the distribution of trees across tree space.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.