View source: R/hierarchical_mutual_information.R
| HierarchicalMutualInfo | R Documentation |
Calculate the Hierarchical Mutual Information (HMI) between two trees, following the recursive algorithm of \insertCitePerotti2020;textualTreeDist.
This function was written during a code sprint: its documentation and test cases have not yet been carefully scrutinized, and its implementation may change without notice. Please alert the maintainer to any issues you encounter.
HierarchicalMutualInfo(tree1, tree2 = NULL, normalize = FALSE)
HMI(tree1, tree2 = NULL, normalize = FALSE)
SelfHMI(tree)
EHMI(tree1, tree2, precision = 0.01, minResample = 36)
AHMI(tree1, tree2, Mean = max, precision = 0.01, minResample = 36)
normalize |
If |
tree, tree1, tree2 |
An object that can be coerced to an |
precision |
Numeric; Monte Carlo sampling will terminate once the relative standard error falls below this value. |
minResample |
Integer specifying minimum number of Monte Carlo samples to conduct. Avoids early termination when sample size is too small to reliably estimate the standard error of the mean. |
Mean |
Function by which to combine the self-information of the two input hierarchies, in order to normalize the HMI. |
HierarchicalMutualInfo() computes the hierarchical mutual content of trees
\insertCitePerotti2015,Perotti2020TreeDist, which accounts for the
non-independence of information represented by nested splits.
tree is converted to a set of hierarchical partitions, and the mutual
information (in bits) is computed recursively; the contribution of a node is
given by:
I(t,s) = \log_2(n_{ts}) - \dfrac{H_{us} + H_{tv} - H_{uv}}{n_{ts}} +
\text{mean}(I_{uv})
Where:
n_{ts} is the number of common elements between partitions
H_{us}, H_{tv}, H_{uv} are entropy terms from child comparisons
I_{uv} is the recursive HMI for child pairs
AHMI() calculates the adjusted hierarchical mutual information:
\text{AHMI}(t, s) = \dfrac{I(t, s) - \hat{I}(t, s)}{
\text{mean}(H(t), H(s)) - \hat{I}(t, s)}
Where:
I(t, s) is the hierarchical mutual information between tree1 and
tree2
\hat{I}(t, s) is the expected HMI between tree1 and
tree2, estimated by Monte Carlo sampling
H(t), H(s) is the entropy (self-mutual information) of each tree
HierarchicalMutualInfo() returns a numeric value representing the
hierarchical mutual information between the input trees, in bits,
normalized as specified.
Higher values indicate more shared hierarchical structure.
SelfHMI() returns the hierarchical mutual information of a tree
compared with itself, i.e. its hierarchical entropy (HH).
EHMI() returns the expected HMI against a uniform
shuffling of element labels, estimated by performing Monte Carlo resampling
on the same hierarchical structure until the relative standard error of the
estimate falls below precision.
The attributes of the returned object list the variance (var),
standard deviation (sd), standard error of the mean (sem) and
relative error (relativeError) of the estimate, and the number of Monte
Carlo samples used to obtain it (samples).
AHMI() returns the adjusted HMI, normalized such that
zero corresponds to the expected HMI given a random shuffling
of elements on the same hierarchical structure. The attribute sem gives
the standard error of the estimate.
Other tree distances:
JaccardRobinsonFoulds(),
KendallColijn(),
MASTSize(),
MatchingSplitDistance(),
NNIDist(),
NyeSimilarity(),
PathDist(),
Robinson-Foulds,
SPRDist(),
TreeDistance()
library("TreeTools", quietly = TRUE)
tree1 <- BalancedTree(8)
tree2 <- PectinateTree(8)
# Calculate HMI between two trees
HierarchicalMutualInfo(tree1, tree2)
# HMI normalized against the mean information content of tree1 and tree2
HierarchicalMutualInfo(tree1, tree2, normalize = mean)
# Normalized HMI above is equivalent to:
HMI(tree1, tree2) / mean(SelfHMI(tree1), SelfHMI(tree2))
# Expected mutual info for this pair of hierarchies
EHMI(tree1, tree2, precision = 0.1)
# The adjusted HMI normalizes against this expectation
AHMI(tree1, tree2, precision = 0.1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.