Hamming: Hamming distance between taxa in a phylogenetic dataset

View source: R/tree_generation.R

HammingR Documentation

Hamming distance between taxa in a phylogenetic dataset

Description

The Hamming distance between a pair of taxa is the number of characters with a different coding, i.e. the smallest number of evolutionary steps that must have occurred since their common ancestor.

Usage

Hamming(
  dataset,
  ratio = TRUE,
  ambig = c("median", "mean", "zero", "one", "na", "nan")
)

Arguments

dataset

Object of class phyDat.

ratio

Logical specifying whether to weight distance against maximum possible, given that a token that is ambiguous in either of two taxa cannot contribute to the total distance between the pair.

ambig

Character specifying value to return when a pair of taxa have a zero maximum distance (perhaps due to a preponderance of ambiguous tokens). "median", the default, take the median of all other distance values; "mean", the mean; "zero" sets to zero; "one" to one; "NA" to NA_integer_; and "NaN" to NaN.

Details

Tokens that contain the inapplicable state are treated as requiring no steps to transform into any applicable token.

Value

Hamming() returns an object of class dist listing the Hamming distance between each pair of taxa.

Author(s)

Martin R. Smith (martin.smith@durham.ac.uk)

See Also

Used to construct neighbour joining trees in NJTree().

dist.hamming() in the phangorn package provides an alternative implementation.

Examples

tokens <- matrix(c(0, 0, "0", 0, "?",
                   0, 0, "1", 0, 1,
                   0, 0, "1", 0, 1,
                   0, 0, "2", 0, 1,
                   1, 1, "-", "?", 0,
                   1, 1, "2", 1, "{01}"),
                   nrow = 6, ncol = 5, byrow = TRUE,
                   dimnames = list(
                     paste0("Taxon_", LETTERS[1:6]),
                     paste0("Char_", 1:5)))

dataset <- MatrixToPhyDat(tokens)
Hamming(dataset)

TreeTools documentation built on Sept. 11, 2024, 8:27 p.m.