rncl: rncl: An R interface to the NEXUS Class Library

View source: R/rncl.R

rnclR Documentation

rncl: An R interface to the NEXUS Class Library

Description

rncl provides an interface to the NEXUS Class Library (NCL), a C++ library intended to parse valid NEXUS files as well as other common formats used in phylogenetic analysis. Currently, rncl focuses on parsing trees and supports both NEXUS and Newick formatted files. Because NCL is used by several phylogenetic software (e.g., MrBayes, Garli), rncl can parse files generated by these programs. However, other popular programs (including BEAST) use an extension of the NEXUS file format, and if trees can be imported, associated annotations (e.g., confidence intervals on the time since divergence) cannot.

Returns a list of the elements contained in a NEXUS file used to build phylogenetic objects in R

Usage

rncl(
  file,
  file.format = c("nexus", "newick"),
  spacesAsUnderscores = TRUE,
  char.all = TRUE,
  polymorphic.convert = TRUE,
  levels.uniform = TRUE,
  show_progress = TRUE,
  ...
)

Arguments

file

path to a NEXUS or Newick file

file.format

a character string indicating the type of file to be parsed.

spacesAsUnderscores

In the NEXUS file format white spaces are not allowed and are represented by underscores. Therefore, NCL converts underscores found in taxon labels in the NEXUS file into white spaces (e.g. species_1 will become "species 1"). If you want to preserve the underscores, set as TRUE (default). This option affects taxon labels, character labels and state labels.

char.all

If TRUE (default), returns all characters, even those excluded in the NEXUS file (only when NEXUS file contains DATA block).

polymorphic.convert

If TRUE (default), converts polymorphic characters to missing data (only when NEXUS file contains DATA block).

levels.uniform

If TRUE (default), uses the same levels for all characters (only when NEXUS file contains DATA block).

show_progress

If TRUE (default)), a progress bar is displayed during the possibly time consuming step of removing the singletons from the tree.

...

additional parameters (currently not in use).

Details

NCL can also parse data associated with species included in NEXUS files. If you are interested in importing such data, see the phylobase package.

NEXUS is a common file format used in phylogenetics to represent phylogenetic trees, and other types of phylogenetic data. This function uses NCL (the NEXUS Class Library) to parse NEXUS, Newick or other common phylogenetic file formats, and returns the relevant elements as a list. phylo (from the ape package) or phylo4 (from the phylobase package) can be constructed from the elements contained in this list.

Value

A list that contains the elements extracted from a NEXUS or a Newick file.

  • taxaNames A vector of the taxa names listed in the TAXA block of the NEXUS file or inferred from the tree strings (if block missing or Newick file).

  • treeNames A vector listing the names of the trees

  • taxonLabelVector A list containing as many elements as there are trees in the file. Each element is a character vector that lists the taxon names encountered in the tree string *in the order they appear*, and therefore may not match the order they are listed in the translation table.

  • parentVector A list containing as many elements as there are trees in the file. Each element is a numeric vector listing the parent node for the node given by its position in the vector. If the beginning of the vector is 5 5 6, the parent node of node 1 is 5, the parent of node 2 is 5 and the parent of node 3 is 6. The implicit root of the tree is identified with 0 (node without a parent).

  • branchLengthVector A list containing as many elements as there are trees in the file. Each element is a numeric vector listing the edge/branch lengths for the edges in the same order as nodes are listed in the corresponding parentVector element. Values of -999 indicate that the value is missing for this particular edge. The implicit root as a length of 0.

  • nodeLabelsVector A list containing as many elements as there are trees in the file. Each element is a character vector listing the node labels in the same order as the nodes are specified in the same order as nodes are listed in the corresponding parentVector element.

  • trees A character vector listing the tree strings where tip labels have been replaced by their indices in the taxaNames vector. They do not correspond to the numbers listed in the translation table that might be associated with the tree.

  • dataTypes A character vector indicating the type of data associated with the tree (e.g., “standard”).

  • nbCharacters A numeric vector indicating how many characters/traits are available.

  • charLabels A character vector listing the names of the characters/traits that are available.

  • nbStates A numeric vector listing the number of possible states for each character/trait.

  • stateLabels A character vector listing in order, all possible states for each character/trait.

  • dataChr A character vector with as many elements as there are characters/traits in the dataset. Each element is string that can be parsed by R to create a factor vector representing the data found in the file.

  • isRooted A list with as many elements as there are trees in the file. Each element is a logical indicating whether the tree is rooted. NCL definition of a rooted tree differs from the one APE uses in some cases.

  • hasPolytomies A list with as many elements as there are trees in the file. Each element is a logical indicating whether the tree contains polytomies.

  • hasSingletons A list with as many elements as there are trees in the file. Each element is a logical indicating whether the tree contains singleton nodes, in other words nodes with a single descendant (also known as knuckles).

Author(s)

Francois Michonneau

References

Maddison DR, Swofford DL, Maddison WP (1997). "NEXUS: An extensible file format for systematic information". Systematic Biology 46(4) : 590-621. doi: doi: 10.1093/sysbio/46.4.590

Lewis, P. O. 2003. NCL: a C++ class library for interpreting data files in NEXUS format. Bioinformatics 19 (17) : 2330-2331.

See Also

For examples on how to use the elements of the list returned by this function to build tree objects, inspect the source code of this package, in particular how read_newick_phylo and read_nexus_phylo work. For a more complex example that also use the data contained in NEXUS files, inspect the source code of the readNCL function in the phylobase package.


fmichonneau/rncl documentation built on Jan. 9, 2023, 6:42 a.m.