TreeInfo: Information content of splits within a tree
In TreeDist: Calculate and Map Distances Between Phylogenetic Trees

TreeInfo

R Documentation

Information content of splits within a tree

Description

Sum the entropy (ClusteringEntropy()), clustering information content (ClusteringInfo()), or phylogenetic information content (SplitwiseInfo()) across each split within a phylogenetic tree, or the consensus of a set of phylogenetic trees (ConsensusInfo()). This value will be greater than the total information content of the tree where a tree contains multiple splits, as these splits are not independent and thus contain mutual information that is counted more than once

Usage

SplitwiseInfo(x, p = NULL, sum = TRUE)

ClusteringEntropy(x, p = NULL, sum = TRUE)

ClusteringInfo(x, p = NULL, sum = TRUE)

## S3 method for class 'phylo'
ClusteringEntropy(x, p = NULL, sum = TRUE)

## S3 method for class 'list'
ClusteringEntropy(x, p = NULL, sum = TRUE)

## S3 method for class 'multiPhylo'
ClusteringEntropy(x, p = NULL, sum = TRUE)

## S3 method for class 'Splits'
ClusteringEntropy(x, p = NULL, sum = TRUE)

## S3 method for class 'phylo'
ClusteringInfo(x, p = NULL, sum = TRUE)

## S3 method for class 'list'
ClusteringInfo(x, p = NULL, sum = TRUE)

## S3 method for class 'multiPhylo'
ClusteringInfo(x, p = NULL, sum = TRUE)

## S3 method for class 'Splits'
ClusteringInfo(x, p = NULL, sum = TRUE)

ConsensusInfo(trees, info = "phylogenetic", p = 0.5, check.tips = TRUE)

Arguments

`x`	A tree of class `phylo`, a list of trees, or a `multiPhylo` object.
`p`	Scalar from 0.5 to 1 specifying minimum proportion of trees that must contain a split for it to appear within the consensus.
`sum`	Logical: if `TRUE`, sum the information content of each split to provide the total splitwise information content of the tree.
`trees`	List of `phylo` objects, optionally with class `multiPhylo`.
`info`	Abbreviation of "phylogenetic" or "clustering", specifying the concept of information to employ.
`check.tips`	Logical specifying whether to renumber leaves such that leaf numbering is consistent in all trees.

Value

SplitwiseInfo(), ClusteringInfo() and ClusteringEntropy() return the splitwise information content of the tree – or of each split in turn, if sum = FALSE – in bits.

ConsensusInfo() returns the splitwise information content of the majority rule consensus of trees.

Clustering information

Clustering entropy addresses the question "how much information is contained in the splits within a tree". Its approach is complementary to the phylogenetic information content, used in SplitwiseInfo(). In essence, it asks, given a split that subdivides the leaves of a tree into two partitions, how easy it is to predict which partition a randomly drawn leaf belongs to \insertCite@Meila2007; @Vinh2010TreeDist.

Formally, the entropy of a split S that divides n leaves into two partitions of sizes a and b is given by H(S) = - a/n log a/n - b/n log b/n.

Base 2 logarithms are conventionally used, such that entropy is measured in bits. Entropy denotes the number of bits that are necessary to encode the outcome of a random variable: here, the random variable is "what partition does a randomly selected leaf belong to".

An even split has an entropy of 1 bit: there is no better way of encoding an outcome than using one bit to specify which of the two partitions the randomly selected leaf belongs to.

An uneven split has a lower entropy: membership of the larger partition is common, and thus less surprising; it can be signified using fewer bits in an optimal compression system.

If this sounds confusing, let's consider creating a code to transmit the cluster label of two randomly selected leaves. One straightforward option would be to use

00 = "Both leaves belong to partition A"
11 = "Both leaves belong to partition B"
01 = 'First leaf in A, second in B'
10 = 'First leaf in B, second in A'

This code uses two bits to transmit the partition labels of two leaves. If partitions A and B are equiprobable, this is the optimal code; our entropy – the average information content required per leaf – is 1 bit.

Alternatively, we could use the (suboptimal) code

0 = "Both leaves belong to partition A"
111 = "Both leaves belong to partition B"
101 = 'First leaf in A, second in B'
110 = 'First leaf in B, second in A'

If A is much larger than B, then most pairs of leaves will require just a single bit (code 0). The additional bits when 1+ leaf belongs to B may be required sufficiently rarely that the average message requires fewer than two bits for two leaves, so the entropy is less than 1 bit. (The optimal coding strategy will depend on the exact sizes of A and B.)

As entropy measures the bits required to transmit the cluster label of each leaf \insertCite@Vinh2010: p. 2840TreeDist, the information content of a split is its entropy multiplied by the number of leaves.

Phylogenetic information

Phylogenetic information expresses the information content of a split in terms of the probability that a uniformly selected tree will contain it \insertCiteThorley1998TreeDist.

Consensus information

The information content of splits in a consensus tree is calculated by interpreting support values (i.e. the proportion of trees containing each split in the consensus) as probabilities that the true tree contains that split, following \insertCiteSmithCons;textualTreeDist.

Author(s)

Martin R. Smith (martin.smith@durham.ac.uk)

References

\insertAllCited

Examples

library("TreeTools", quietly = TRUE)

SplitwiseInfo(PectinateTree(8))
tree <- read.tree(text = "(a, b, (c, (d, e, (f, g)0.8))0.9);")
SplitwiseInfo(tree)
SplitwiseInfo(tree, TRUE)

# Clustering entropy of an even split = 1 bit
ClusteringEntropy(TreeTools::as.Splits(c(rep(TRUE, 4), rep(FALSE, 4))))

# Clustering entropy of an uneven split < 1 bit
ClusteringEntropy(TreeTools::as.Splits(c(rep(TRUE, 2), rep(FALSE, 6))))

tree1 <- TreeTools::BalancedTree(8)
tree2 <- TreeTools::PectinateTree(8)

ClusteringInfo(tree1)
ClusteringEntropy(tree1)
ClusteringInfo(list(one = tree1, two = tree2))

ClusteringInfo(tree1) + ClusteringInfo(tree2)
ClusteringEntropy(tree1) + ClusteringEntropy(tree2)
ClusteringInfoDistance(tree1, tree2)
MutualClusteringInfo(tree1, tree2)

# Clustering entropy with uncertain splits
tree <- ape::read.tree(text = "(a, b, (c, (d, e, (f, g)0.8))0.9);")
ClusteringInfo(tree)
ClusteringInfo(tree, TRUE)

# Support-weighted information content of a consensus tree
set.seed(0)
trees <- list(RandomTree(8), RootTree(BalancedTree(8), 1), PectinateTree(8))
cons <- consensus(trees, p = 0.5)
p <- SplitFrequency(cons, trees) / length(trees)
plot(cons)
LabelSplits(cons, signif(SplitwiseInfo(cons, p, sum = FALSE), 4))
ConsensusInfo(trees)
LabelSplits(cons, signif(ClusteringInfo(cons, p, sum = FALSE), 4))
ConsensusInfo(trees, "clustering")

TreeDist documentation built on Nov. 5, 2025, 6:04 p.m.