assignTaxonomy: Classifies sequences against reference training dataset.

Description Usage Arguments Value Examples

View source: R/taxonomy.R

Description

assignTaxonomy implements the RDP Naive Bayesian Classifier algorithm described in Wang et al. Applied and Environmental Microbiology 2007, with kmer size 8 and 100 bootstrap replicates. Properly formatted reference files for several popular taxonomic databases are available http://benjjneb.github.io/dada2/training.html

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
assignTaxonomy(
  seqs,
  refFasta,
  minBoot = 50,
  tryRC = FALSE,
  outputBootstraps = FALSE,
  taxLevels = c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"),
  multithread = FALSE,
  verbose = FALSE
)

Arguments

seqs

(Required). A character vector of the sequences to be assigned, or an object coercible by getUniques.

refFasta

(Required). The path to the reference fasta file, or an R connection Can be compressed. This reference fasta file should be formatted so that the id lines correspond to the taxonomy (or classification) of the associated sequence, and each taxonomic level is separated by a semicolon. Eg.

>Kingom;Phylum;Class;Order;Family;Genus; ACGAATGTGAAGTAA......

minBoot

(Optional). Default 50. The minimum bootstrap confidence for assigning a taxonomic level.

tryRC

(Optional). Default FALSE. If TRUE, the reverse-complement of each sequences will be used for classification if it is a better match to the reference sequences than the forward sequence.

outputBootstraps

(Optional). Default FALSE. If TRUE, bootstrap values will be retained in an integer matrix. A named list containing the assigned taxonomies (named "taxa") and the bootstrap values (named "boot") will be returned. Minimum bootstrap confidence filtering still takes place, to see full taxonomy set minBoot=0

taxLevels

(Optional). Default is c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"). The taxonomic levels being assigned. Truncates if deeper levels not present in training fasta.

multithread

(Optional). Default is FALSE. If TRUE, multithreading is enabled and the number of available threads is automatically determined. If an integer is provided, the number of threads to use is set by passing the argument on to setThreadOptions.

verbose

(Optional). Default FALSE. If TRUE, print status to standard output.

Value

A character matrix of assigned taxonomies exceeding the minBoot level of bootstrapping confidence. Rows correspond to the provided sequences, columns to the taxonomic levels. NA indicates that the sequence was not consistently classified at that level at the minBoot threshhold.

If outputBootstraps is TRUE, a named list containing the assigned taxonomies (named "taxa") and the bootstrap values (named "boot") will be returned.

Examples

1
2
3
4
seqs <- getSequences(system.file("extdata", "example_seqs.fa", package="dada2"))
training_fasta <- system.file("extdata", "example_train_set.fa.gz", package="dada2")
taxa <- assignTaxonomy(seqs, training_fasta)
taxa80 <- assignTaxonomy(seqs, training_fasta, minBoot=80, multithread=2)

dada2 documentation built on Nov. 8, 2020, 6:48 p.m.