rdpClassify: Classifying with the RDP classifier
In microclass: Methods for Taxonomic Classification of Prokaryotes

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/rdpClassifier.R

Classifying sequences by a trained presence/absence K-mer model.

1	rdpClassify(sequence, trained.model, post.prob = FALSE, prior = FALSE)

`sequence`	Character vector of sequences to classify.
`trained.model`	A list with a trained model, see `rdpTrain`.
`post.prob`	Logical indicating if posterior log-probabilities should be returned.
`prior`	Logical indicating if classification should be done by flat priors (default) or with empirical priors (prior=TRUE).

The classification step of the presence/absence method known as the RDP classifier (Wang et al 2007) means looking for K-mers on all sequences, and computing the posterior probabilities for each taxon using a trained model and a naive Bayes assumption. The predicted taxon is the one producing the maximum posterior probability, for each sequence.

The classification is parallelized through RcppParallel employing Intel TBB and TinyThread. By default all available processing cores are used. This can be changed using the function setParallel.

A character vector with the predicted taxa, one for each sequence.

Kristian Hovde Liland and Lars Snipen.

Wang, Q, Garrity, GM, Tiedje, JM, Cole, JR (2007). Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Applied and Enviromental Microbiology, 73: 5261-5267.

rdpTrain.

data("small.16S")
seq <- small.16S$Sequence
tax <- sapply(strsplit(small.16S$Header,split=" "),function(x){x[2]})
## Not run: 
trn <- rdpTrain(seq,tax)
primer.515f <- "GTGYCAGCMGCCGCGGTAA"
primer.806rB <- "GGACTACNVGGGTWTCTAAT"
reads <- amplicon(seq, primer.515f, primer.806rB)
predicted <- rdpClassify(unlist(reads[nchar(reads)>0]),trn)
print(predicted)

## End(Not run)