rdp: Ribosomal Database Project (RDP) Classifier for 16S rRNA

View source: R/RDP.R

rdpR Documentation

Ribosomal Database Project (RDP) Classifier for 16S rRNA

Description

Use the RDP classifier (Wang et al, 2007) to classify 16S rRNA sequences. This package contains currently RDP version 2.14 released in August 2023. The associated data package rRDPData contains models trained on the bacterial and archaeal taxonomy training set No. 19 (see Wang and Cole, 2024).

Usage

rdp(dir = NULL)

## S3 method for class 'RDPClassifier'
predict(object, newdata, confidence = 0.8, rdp_args = "", verbose = FALSE, ...)

trainRDP(x, dir = "classifier", rank = "genus", verbose = FALSE)

removeRDP(object)

Arguments

dir

directory where the classifier information is stored.

object

a RDPClassifier object.

newdata

new data to be classified as a Biostrings::DNAStringSet.

confidence

numeric; minimum confidence level for classification. Results with lower confidence are replaced by NAs. Set to 0 to disable.

rdp_args

additional RDP arguments for classification (e.g., "-minWords 5" to set the minimum number of words for each bootstrap trial.). See RDP documentation.

verbose

logical; print additional information.

...

additional arguments (currently unused).

x

an object of class Biostrings::DNAStringSet with the 16S rRNA sequences for training.

rank

Taxonomic rank at which the classification is learned.

Details

RDP is a naive Bayes classifier using 8-mers as features.

rdp() creates a default classifier trained with the data shipped with RDP. Alternatively, a directory with the data for an existing classifier (created with trainRDP()) can be supplied.

trainRDP() creates a new classifier for the data in x and stores the classifier information in dir. The data in x needs to have annotations in the following format:

"<ID> <Kingdom>;<Phylum>;<Class>;<Order>;<Family>;<Genus>"

A created classifier can be removed with removeRDP(). This will remove the directory which stores the classifier information.

The data for the default 16S rRNA classifier can be found in package rRDPData.

Value

rdp() and trainRDP() return a RDPClassifier object.

predict() returns a data.frame containing the classification results for each sequence (rows). The data.frame has an attribute called "confidence" with a matrix containing the confidence values.

References

Hahsler M, Nagar A (2020). "rRDP: Interface to the RDP Classifier." R Package, Bioconductor. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18129/B9.bioc.rRDP")}.

RDP classifier software: https://sourceforge.net/projects/rdp-classifier/

Qiong Wang, George M. Garrity, James M. Tiedje and James R. Cole. Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy, Appl. Environ. Microbiol. August 2007 vol. 73 no. 16 5261-5267. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1128/AEM.00062-07")}

Qiong W. and Cole J.R. Updated RDP taxonomy and RDP Classifier for more accurate taxonomic classification, Microbial Ecology, Announcement, 4 March 2024. \Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.1128/mra.01063-23")}

Examples

### Use the default classifier
seq <- readRNAStringSet(system.file("examples/RNA_example.fasta",
    package = "rRDP"
))

## shorten names
names(seq) <- sapply(strsplit(names(seq), " "), "[", 1)
seq

## use rdp for classification (this needs package rRDPData installed)
## > BiocManager::install("rRDPData")

cl_16S <- rdp()
cl_16S

pred <- predict(cl_16S, seq)
pred

attr(pred, "confidence")

### Train a custom RDP classifier on new data
trainingSequences <- readDNAStringSet(
    system.file("examples/trainingSequences.fasta", package = "rRDP")
)

customRDP <- trainRDP(trainingSequences)
customRDP

testSequences <- readDNAStringSet(
    system.file("examples/testSequences.fasta", package = "rRDP")
)
predict(customRDP, testSequences)

## clean up
removeRDP(customRDP)

mhahsler/rRDP documentation built on April 29, 2024, 9:11 a.m.