Ribosomal Database Project (RDP) Classifier for 16S rRNA

Description

Use the RDP classifier to classify 16S rRNA sequences. This package contains currently RDP version 2.9.

Usage

1
2
3
4
5
6
rdp(dir = NULL)
## S3 method for class 'RDPClassifier'
predict(object, newdata,
  confidence=.8, rdp_args="", java_args="-Xmx1g", ...)
trainRDP(x, dir="classifier", rank="genus", java_args="-Xmx1g")
removeRDP(object)

Arguments

dir

directory where the classifier information is stored.

object

a RDPClassifier object.

newdata

new data to be classified as a DNAStringSet.

confidence

numeric; minimum confidence level for classification. Results with lower confidence are replaced by NAs. Set to 0 to disable.

rdp_args

additional RDP arguments for classification (e.g., "-minWords 5" to set the minimum number of words for each bootstrap trial.). See RDP documentation.

java_args

additional arguments for java (default sets the max. heap memory to 1GB).

x

an object of class DNAStringSet with the 16S rRNA sequences for training.

rank

Taxonomic rank at which the classification is learned.

...

additional arguments (currently unused).

Details

RDP is a naive Bayes classifier using 8-mers as features.

rdp() creates a default classifier trained with the data shipped with RDP. Alternatively, a directory with the data for an existing classifier (created with trainRDP()) can be supplied.

trainRDP() creates a new classifier for the data in x and stores the classifier information in dir. The data in x needs to have annotations in the following format:

"<ID> <Kingdom>;<Phylum>;<Class>;<Order>;<Family>;<Genus>"

A created classifier can be removed with removeRDP(). This will remove the directory which stores the classifier information.

The data for the default 16S rRNA classifier can be found in package rRDPData.

Value

rdp() and trainRDP() return a RDPClassifier object.

predict() returns a data.frame containing the classification results for each sequence (rows). The data.frame has an attribure called "confidence" with a matrix containing the confidence values.

References

RDP Classifier http://sourceforge.net/projects/rdp-classifier/

Qiong Wang, George M. Garrity, James M. Tiedje and James R. Cole. Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy, Appl. Environ. Microbiol. August 2007 vol. 73 no. 16 5261-5267.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
### Use the default classifier
seq <- readRNAStringSet(system.file("examples/RNA_example.fasta",
	package="rRDP"))

## shorten names
names(seq) <-  sapply(strsplit(names(seq), " "), "[", 1)
seq

## use rdp for classification (this needs package rRDPData) 
pred <- predict(rdp(), seq)
pred
  
attr(pred, "confidence")  

### Train a custom RDP classifier on new data
trainingSequences <- readDNAStringSet(
    system.file("examples/trainingSequences.fasta", package="rRDP"))

customRDP <- trainRDP(trainingSequences)
customRDP

testSequences <- readDNAStringSet(
    system.file("examples/testSequences.fasta", package="rRDP"))
predict(customRDP, testSequences)

## clean up
removeRDP(customRDP)