PhyloDistance-JRF: Jaccard-Robinson-Foulds Distance

PhyloDistance-JRFDistR Documentation

Jaccard-Robinson-Foulds Distance

Description

Calculate JRF distance between two unrooted phylogenies.

Details

This function is called as part of PhyloDistance and calculates the Jaccard-Robinson-Foulds distance between two unrooted phylogenies. Each dendrogram is first pruned to only internal branches implying a partition in the shared leaf set; trivial partitions (where one leaf set contains 1 or 0 leaves) are ignored.

The total score is calculated by pairing branches and scoring their similarity. For a set of two branches A, B that partition the leaves into (A_1, A_2) and (B_1, B_2) (resp.), the distance between the branches is calculated as:

2 - 2\left(\frac{|X \cap Y|}{| X\cup Y|}\right)^k

where X \in (A_1, A_2),\; Y \in (B_1, B_2) are chosen to maximize the score of the pairing, and k the value of ExpVal. The sum of these scores for all branches produces the overall distance between the two trees, which is then normalized by the number of branches in each tree.

There are a few special cases to this distance. If ExpVal=1, the distance is equivalent to the metric introduced in Nye et al. (2006). As ExpVal approaches infinity, the value becomes close to the (non-Generalized) Robinson Foulds Distance.

Value

Returns a normalized distance, with 0 indicating identical trees and 1 indicating maximal difference.

If RawScore=TRUE, returns a named length 3 vector with the first entry the summed distance score over the branch pairings, and the subsequent entries the number of partitions for each tree.

If the trees have no leaves in common, the function will return 1 if RawScore=FALSE, and c(0, NA, NA) if TRUE.

Note

Note that this function requires the input dendrograms to be labeled alike (ex. leaf labeled abc in dend1 represents the same species as leaf labeled abc in dend2). Labels can easily be modified using dendrapply.

Author(s)

Aidan Lakshman ahl27@pitt.edu

References

Nye, T. M. W., Liò, P., & Gilks, W. R. A novel algorithm and web-based tool for comparing two alternative phylogenetic trees. Bioinformatics, 2006. 22(1): 117–119.

Böcker, S., Canzar, S., & Klau, G. W.. The generalized Robinson-Foulds metric. Algorithms in Bioinformatics, 2013. 8126: 156–169.

Examples

# making some toy dendrograms
set.seed(123)
dm1 <- as.dist(matrix(runif(64, 0.5, 5), ncol=8))
dm2 <- as.dist(matrix(runif(64, 0.5, 5), ncol=8))

tree1 <- as.dendrogram(hclust(dm1))
tree2 <- as.dendrogram(hclust(dm2))

# Nye Metric
PhyloDistance(tree1, tree2, Method="JRF", JRFExp=1)

# Jaccard-RobinsonFoulds
PhyloDistance(tree1, tree2, Method="JRF", JRFExp=2)

# Good approximation to RF Dist (note RFDist is much faster for this)
PhyloDistance(tree1, tree2, Method="JRF", JRFExp=1000)
PhyloDistance(tree1, tree2, Method="RF")

npcooley/SynExtend documentation built on May 2, 2024, 7:28 p.m.