sprDistances: Tree distance and SPR moves

sprDistancesR Documentation

Tree distance and SPR moves

Description

Datasets testing whether separating trees by increasingly many moves results in a corresponding increase in their distance.

Usage

sprDistances

Format

A list of length 21. Each entry is named according to the corresponding tree distance method; see 'Methods tested' below.

Each member of the list is a 100 × 100 matrix listing the distance between each pair of trees in the SPR chain (see 'Details'), numbered from 1 to 100.

Details

I generated a chain of 100 50-leaf trees, starting from a pectinate tree and deriving each tree in turn by performing an SPR operation on the previous tree. A consistent measure of tree similarity should correlate with the number of SPR operations separating a pair of trees in this chain. This said, because one SPR operation may counteract some of the difference introduced by a previous one, perfect correlation is unlikely.

For analysis of this data, see the accompanying vignette.

Methods tested

  • pid: Phylogenetic Information Distance (Smith 2020)

  • msid: Matching Split Information Distance (Smith 2020)

  • cid: Clustering Information Distance (Smith 2020)

  • qd: Quartet divergence (Smith 2019)

  • nye: Nye et al. tree distance (Nye et al. 2006)

  • jnc2, jnc4: Jaccard-Robinson-Foulds distances with k = 2, 4, conflicting pairings prohibited ('no-conflict')

  • joc2, jco4: Jaccard-Robinson-Foulds distances with k = 2, 4, conflicting pairings permitted ('conflict-ok')

  • ms: Matching Split Distance (Bogdanowicz & Giaro 2012)

  • mast: Size of Maximum Agreement Subtree (Valiente 2009)

  • masti: Information content of Maximum Agreement Subtree

  • nni_l, nni_t, nni_u: Lower bound, tight upper bound, and upper bound for nearest-neighbour interchange distance (Li et al. 1996)

  • spr: Approximate SPR distance

  • tbr_l, tbr_u: Lower and upper bound for tree bisection and reconnection (TBR) distance, calculated using TBRDist

  • rf: Robinson-Foulds distance (Robinson & Foulds 1981)

  • icrf: Information-corrected Robinson-Foulds distance (Smith 2020)

  • path: Path distance (Steel & Penny 1993)

Source

Scripts used to generate data objects are housed in the data-raw directory.

References

\insertRef

Bogdanowicz2012TreeDist

\insertRef

Li1996TreeDist

\insertRef

Kendall2016TreeDistData

\insertRef

Nye2006TreeDist

\insertRef

Robinson1981TreeDist

\insertRef

Smith2019TreeDist

\insertRef

SmithDistTreeDist

\insertRef

Steel1993TreeDist

\insertRef

Valiente2009TreeDist


ms609/TreeDistData documentation built on June 30, 2024, 7:21 p.m.