PhyloDistance-CIDist | R Documentation |
Calculate distance between two unrooted phylogenies using mutual clustering information of branch partitions.
This function is called as part of PhyloDistance
and calculates tree distance using the clustering information
approach first described in Smith (2020). This function iteratively pairs
internal tree branches of a phylogeny based on their similarity, then scores
overall similarity as the sum of these measures. The similarity score is then
converted to a distance by normalizing by the average entropy of the two trees.
This metric has been demonstrated to outperform numerous other metrics in capabilities;
see the original publication cited in References for more information.
Users may wish to use the actual similarity values rather than a distance metric;
the option to specify RawScore=TRUE
is provided for this case. Distance
is calculated as \frac{M - S}{M}
, where M=\frac{1}{2}(H_1 + H_2)
, H_i
is the entropy of the i
'th tree, and S
is the similarity score between them. As shown in
the original publication, this satisfies the necessary requirements to be considered
a distance metric. Setting RawScore=TRUE
will instead return a vector with
(S, H_1, H_2, p)
, where p
is an approximation for the two sided p-value of the result based on random simulations from Smith (2020).
Returns a normalized distance, with 0 indicating identical trees and 1 indicating maximal difference. Note that branch lengths are not considered, so two trees with different branch lengths may return a distance of 0.
If RawScore=TRUE
, returns a named length 4 vector with the first entry the similarity
score, subsequent entries the entropy values for each tree, and the last entry the approximate p-value for the result based on simulations.
If the trees have no leaves in common, the function will return 1
if
RawScore=FALSE
, and c(0, NA, NA, NA)
if TRUE
.
Note that this function requires the input dendrograms to be labeled alike (ex.
leaf labeled abc
in dend1
represents the same species as
leaf labeled abc
in dend2
).
Labels can easily be modified using dendrapply
.
Aidan Lakshman ahl27@pitt.edu
Smith, Martin R. Information theoretic generalized Robinson–Foulds metrics for comparing phylogenetic trees. Bioinformatics, 2020. 36(20):5007-5013.
# making some toy dendrograms
set.seed(123)
dm1 <- as.dist(matrix(runif(64, 0.5, 5), ncol=8))
dm2 <- as.dist(matrix(runif(64, 0.5, 5), ncol=8))
tree1 <- as.dendrogram(hclust(dm1))
tree2 <- as.dendrogram(hclust(dm2))
# get RF distance
PhyloDistance(tree1, tree2, Method="CI")
# get similarity score with individual entropies
PhyloDistance(tree1, tree2, Method="CI", RawScore=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.