HierarchicalMutualInfo: Hierarchical Mutual Information
In TreeDist: Calculate and Map Distances Between Phylogenetic Trees

View source: R/hierarchical_mutual_information.R

HierarchicalMutualInfo

R Documentation

Hierarchical Mutual Information

Description

Calculate the Hierarchical Mutual Information (HMI) between two trees, following the recursive algorithm of \insertCitePerotti2020;textualTreeDist.

This function was written during a code sprint: its documentation and test cases have not yet been carefully scrutinized, and its implementation may change without notice. Please alert the maintainer to any issues you encounter.

Usage

HierarchicalMutualInfo(tree1, tree2 = NULL, normalize = FALSE)

HMI(tree1, tree2 = NULL, normalize = FALSE)

SelfHMI(tree)

EHMI(tree1, tree2, precision = 0.01, minResample = 36)

AHMI(tree1, tree2, Mean = max, precision = 0.01, minResample = 36)

Arguments

`normalize`	If `FALSE`, return the raw HMI, in bits. If `TRUE`, normalize to range [0,1] by dividing by `max(SelfHMI(tree1), SelfHMI(tree2))`. If a function, divide by `normalize(SelfHMI(tree1), SelfHMI(tree2))`.
`tree`, `tree1`, `tree2`	An object that can be coerced to an `HPart` object.
`precision`	Numeric; Monte Carlo sampling will terminate once the relative standard error falls below this value.
`minResample`	Integer specifying minimum number of Monte Carlo samples to conduct. Avoids early termination when sample size is too small to reliably estimate the standard error of the mean.
`Mean`	Function by which to combine the self-information of the two input hierarchies, in order to normalize the HMI.

Details

HierarchicalMutualInfo() computes the hierarchical mutual content of trees \insertCitePerotti2015,Perotti2020TreeDist, which accounts for the non-independence of information represented by nested splits.

tree is converted to a set of hierarchical partitions, and the mutual information (in bits) is computed recursively; the contribution of a node is given by:

I(t,s) = \log_2(n_{ts}) - \dfrac{H_{us} + H_{tv} - H_{uv}}{n_{ts}} + \text{mean}(I_{uv})

Where:

n_{ts} is the number of common elements between partitions
H_{us}, H_{tv}, H_{uv} are entropy terms from child comparisons
I_{uv} is the recursive HMI for child pairs

AHMI() calculates the adjusted hierarchical mutual information:

\text{AHMI}(t, s) = \dfrac{I(t, s) - \hat{I}(t, s)}{ \text{mean}(H(t), H(s)) - \hat{I}(t, s)}

Where:

I(t, s) is the hierarchical mutual information between tree1 and tree2
\hat{I}(t, s) is the expected HMI between tree1 and tree2, estimated by Monte Carlo sampling
H(t), H(s) is the entropy (self-mutual information) of each tree

Value

HierarchicalMutualInfo() returns a numeric value representing the hierarchical mutual information between the input trees, in bits, normalized as specified. Higher values indicate more shared hierarchical structure.

SelfHMI() returns the hierarchical mutual information of a tree compared with itself, i.e. its hierarchical entropy (HH).

EHMI() returns the expected HMI against a uniform shuffling of element labels, estimated by performing Monte Carlo resampling on the same hierarchical structure until the relative standard error of the estimate falls below precision. The attributes of the returned object list the variance (var), standard deviation (sd), standard error of the mean (sem) and relative error (relativeError) of the estimate, and the number of Monte Carlo samples used to obtain it (samples).

AHMI() returns the adjusted HMI, normalized such that zero corresponds to the expected HMI given a random shuffling of elements on the same hierarchical structure. The attribute sem gives the standard error of the estimate.

References

\insertAllCited

Examples

library("TreeTools", quietly = TRUE)

tree1 <- BalancedTree(8)
tree2 <- PectinateTree(8)

# Calculate HMI between two trees
HierarchicalMutualInfo(tree1, tree2)

# HMI normalized against the mean information content of tree1 and tree2
HierarchicalMutualInfo(tree1, tree2, normalize = mean)

# Normalized HMI above is equivalent to:
HMI(tree1, tree2) / mean(SelfHMI(tree1), SelfHMI(tree2))
# Expected mutual info for this pair of hierarchies
EHMI(tree1, tree2, precision = 0.1)
# The adjusted HMI normalizes against this expectation
AHMI(tree1, tree2, precision = 0.1)

TreeDist documentation built on Nov. 5, 2025, 6:04 p.m.