NexusTokensToInteger: Convert Nexus token matrix to integer

View source: R/parse_files.R

NexusTokensToIntegerR Documentation

Convert Nexus token matrix to integer

Description

NexusTokensToInteger() converts the character matrix returned by ReadCharacters() to an integer matrix, mapping polymorphic, ambiguous (⁠?⁠), and inapplicable (-) tokens to NA_integer_ or to the first/last state listed in the polymorphism, depending on polymorphism.

Usage

NexusTokensToInteger(tokens, polymorphism = c("?", "first", "last"))

Arguments

tokens

Character matrix as returned by ReadCharacters(), a character vector as returned by NexusTokens(), or a phyDat object.

polymorphism

Character string specifying how to handle polymorphic tokens such as "(01)" or "{12}":

"?" (default)

Treat as the NEXUS missing-data token: map to NA_integer_.

"first"

Use the first state digit inside the brackets.

"last"

Use the last state digit inside the brackets.

Tokens "?" and "-" always map to NA_integer_ regardless of polymorphism.

Details

Only digit states 0..9 are recognised; non-digit symbols (and any token whose interior contains no digits) become NA_integer_. Polymorphism extraction (polymorphism = "first"/"last") likewise considers digits only.

If tokens is a phyDat object it is first converted via PhyDatToMatrix() with ⁠ambigNA = TRUE, inappNA = TRUE⁠, so that fully-ambiguous and inapplicable rows become NA_integer_ and only true partial polymorphisms are subject to the polymorphism rule.

Value

An integer matrix (or vector) with the same dimensions and dimnames as tokens.

Author(s)

Martin R. Smith (martin.smith@durham.ac.uk)

See Also

Other phylogenetic matrix conversion functions: Decompose(), MatrixToPhyDat(), Reweight(), StringToPhyDat()

Examples

tokens <- matrix(c("0", "(12)", "1", "?", "-"),
                 nrow = 1,
                 dimnames = list("Taxon_A", paste0("C", 1:5)))
NexusTokensToInteger(tokens)
NexusTokensToInteger(tokens, polymorphism = "first")


TreeTools documentation built on June 2, 2026, 5:06 p.m.