Description Usage Arguments Details Value Author(s) References See Also Examples
Classifies sequences according to a training set by assigning a confidence to taxonomic labels for each taxonomic level.
1 2 3 4 5 6 7 8 9 10 11 |
test |
An |
trainingSet |
An object of class |
type |
Character string indicating the type of output desired. This should be (an abbreviation of) one of |
strand |
Character string indicating the orientation of the |
threshold |
Numeric specifying the confidence at which to truncate the output taxonomic classifications. Lower values of |
bootstraps |
Integer giving the maximum number of bootstrap replicates to perform for each sequence. The number of bootstrap replicates is set automatically such that (on average) 99% of k-mers are sampled in each |
samples |
A function or call written as a function of ‘L’, which will evaluate to a numeric vector the same length as ‘L’. Typically of the form “ |
minDescend |
Numeric giving the minimum fraction of |
fullLength |
Numeric specifying the fold-difference in sequence lengths between sequences in |
processors |
The number of processors to use, or |
verbose |
Logical indicating whether to display progress. |
Sequences in test
are each assigned a taxonomic classification based on the trainingSet
created with LearnTaxa
. Each taxonomic level is given a confidence between 0% and 100%, and the taxonomy is truncated where confidence drops below threshold
. If the taxonomic classification was truncated, the last group is labeled with “unclassified_” followed by the final taxon's name. Note that the reported confidence is not a p-value but does directly relate to a given classification's probability of being wrong. The default threshold
of 60%
is intended to minimize the rate of incorrect classifications. Lower values of threshold
(e.g., 50%
) may be preferred to increase the taxonomic depth of classifications.
If type
is "extended"
(the default) then an object of class Taxa
and subclass Train is returned. This is stored as a list with elements corresponding to their respective sequence in test
. Each list element contains components:
taxon |
A character vector containing the taxa to which the sequence was assigned. |
confidence |
A numeric vector giving the corresponding percent confidence for each taxon. |
rank |
If the classifier was trained with a set of |
If type
is "collapsed"
then a character vector is returned with the taxonomic assignment for each sequence. This takes the repeating form “Taxon name [rank, confidence%]; ...” if rank
s were supplied during training, or “Taxon name [confidence%]; ...” otherwise.
Erik Wright eswright@pitt.edu
Murali, A., et al. (2018). IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences. Microbiome, 6, 140. https://doi.org/10.1186/s40168-018-0521-5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | data("TrainingSet_16S")
# import test sequences
fas <- system.file("extdata", "Bacteria_175seqs.fas", package="DECIPHER")
dna <- readDNAStringSet(fas)
# remove any gaps in the sequences
dna <- RemoveGaps(dna)
# classify the test sequences
ids <- IdTaxa(dna, TrainingSet_16S, strand="top")
ids
# view the results
plot(ids, TrainingSet_16S)
|
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which.max, which.min
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:base’:
expand.grid
Loading required package: IRanges
Loading required package: XVector
Attaching package: ‘Biostrings’
The following object is masked from ‘package:base’:
strsplit
Loading required package: RSQLite
================================================================================
Time difference of 11.31 secs
A test set of class 'Taxa' with length 175
confidence name taxon
[1] 63% uncultured bacter... Root; Bacteria; Firmicutes; Bacilli; Ba...
[2] 68% uncultured bacter... Root; Bacteria; Firmicutes; Bacilli; Ba...
[3] 62% uncultured bacter... Root; Bacteria; Firmicutes; Bacilli; Ba...
[4] 92% uncultured bacter... Root; Bacteria; Firmicutes; Bacilli; La...
[5] 62% uncultured bacter... Root; Bacteria; Firmicutes; Clostridia;...
... ... ... ...
[171] 38% uncultured bacter... Root; unclassified_Root
[172] 49% uncultured bacter... Root; unclassified_Root
[173] 31% uncultured bacter... Root; unclassified_Root
[174] 48% uncultured bacter... Root; unclassified_Root
[175] 51% uncultured bacter... Root; unclassified_Root
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.