linTests: Evaluating tree distance metrics by cluster recovery

Description Usage Format Details Methods tested Source References

Description

An effective measure of tree distance will recover clusters of similar trees. These datasets contain the results of tests modelled on those in Lin et al. (2012).

Usage

1
2
3
4
5

Format

A three-dimensional array.

Rows correspond to the clustering methods:

Columns correspond to distance metrics; see 'Methods tested' below.

Slices correspond to values of k:

An object of class array of dimension 5 x 24 x 4.

An object of class array of dimension 5 x 24 x 5.

Details

I used three approaches to generate clusters of similar trees, and tested each metric in its ability to recover these clusters (Lin et al., 2012).

For the first test, I generated 500 datasets of 100 binary trees with n = 40 leaves. Each set of trees was created by randomly selecting two k-leaf 'skeleton' trees, where k ranges from 0.3 n to 0.9 n. From each skeleton, 50 trees were generated by adding each of the remaining n - k leaves in turn at a uniformly selected point on the tree.

For the second and third test, each dataset was constructed by selecting at random two binary 40-leaf trees. From each starting tree, I generated 50 binary trees by conducting k leaf-label interchange (LLI) operations (test two) or k subtree prune and regraft (SPR) operations (test three) on the starting tree. An LLI operation swaps the positions of two randomly selected leaves, without affecting tree shape; an SPR operation moves a subtree to a new location within the tree.

For each dataset, I calculated the distance between each pair of trees. Trees where then partitioned into clusters using five methods, using the packages stats and cluster. I define the success rate of each distance measure as the proportion of datasets in which every tree generated from the same skeleton was placed in the same cluster.

For analysis of this data, see the accompanying vignette.

Methods tested

Source

Scripts used to generate data objects are housed in the data-raw directory.

References

\insertRef

Bogdanowicz2012TreeDist

\insertRef

Li1996TreeDist

\insertRef

Kendall2016TreeDistData

\insertRef

Nye2006TreeDist

\insertRef

Robinson1981TreeDist

\insertRef

Smith2019TreeDist

\insertRef

SmithDistTreeDist

\insertRef

Steel1993TreeDist

\insertRef

Valiente2009TreeDist

\insertRef

Lin2012TreeDistData


ms609/TreeDistData documentation built on May 21, 2021, 6:53 a.m.