addSpecies: Add species-level annotation to a taxonomic table.

View source: R/taxonomy.R

addSpeciesR Documentation

Add species-level annotation to a taxonomic table.

Description

addSpecies wraps the assignSpecies function to assign genus-species binomials to the input sequences by exact matching against a reference fasta. Those binomials are then merged with the input taxonomic table with species annotations appended as an additional column to the input table. Only species identifications where the genera in the input table and the binomial classification are consistent are included in the return table.

Usage

addSpecies(
  taxtab,
  refFasta,
  allowMultiple = FALSE,
  tryRC = FALSE,
  n = 2000,
  verbose = FALSE
)

Arguments

taxtab

(Required). A taxonomic table, the output of assignTaxonomy.

refFasta

(Required). The path to the reference fasta file, or an R connection. Can be compressed. This reference fasta file should be formatted so that the id lines correspond to the genus-species binomial of the associated sequence:

>SeqID genus species ACGAATGTGAAGTAA......

allowMultiple

(Optional). Default FALSE. Defines the behavior when multiple exact matches against different species are returned. By default only unambiguous identifications are return. If TRUE, a concatenated string of all exactly matched species is returned. If an integer is provided, multiple identifications up to that many are returned as a concatenated string.

tryRC

(Optional). Default FALSE. If TRUE, the reverse-complement of each sequences will be used for classification if it is a better match to the reference sequences than the forward sequence.

n

(Optional). Default 1e5. The number of records (reads) to read in and filter at any one time. This controls the peak memory requirement so that very large fastq files are supported. See FastqStreamer for details.

verbose

(Optional). Default FALSE. If TRUE, print status to standard output.

Value

A character matrix one column larger than input. Rows correspond to sequences, and columns to the taxonomic levels. NA indicates that the sequence was not classified at that level.

See Also

assignTaxonomy, assignSpecies

Examples


seqs <- getSequences(system.file("extdata", "example_seqs.fa", package="dada2"))
training_fasta <- system.file("extdata", "example_train_set.fa.gz", package="dada2")
taxa <- assignTaxonomy(seqs, training_fasta)
species_fasta <- system.file("extdata", "example_species_assignment.fa.gz", package="dada2")
taxa.spec <- addSpecies(taxa, species_fasta)
taxa.spec.multi <- addSpecies(taxa, species_fasta, allowMultiple=TRUE)


benjjneb/dada2 documentation built on Feb. 1, 2024, 10:50 p.m.