import_RDP_cluster: Import RDP cluster file and return otu_table (abundance...

Description Usage Arguments Details Value References

View source: R/IO-methods.R

Description

The RDP cluster pipeline (specifically, the output of the complete linkage clustering step) has no formal documentation for the ".clust" file or its apparent sequence naming convention.

Usage

1
import_RDP_cluster(RDP_cluster_file)

Arguments

RDP_cluster_file

A character string. The name of the ".clust" file produced by the the complete linkage clustering step of the RDP pipeline.

Details

http://pyro.cme.msu.edu/index.jsp

The cluster file itself contains the names of all sequences contained in input alignment. If the upstream barcode and aligment processing steps are also done with the RDP pipeline, then the sequence names follow a predictable naming convention wherein each sequence is named by its sample and sequence ID, separated by a "_" as delimiter:

"sampleName_sequenceIDnumber"

This import function assumes that the sequence names in the cluster file follow this convention, and that the sample name does not contain any "_". It is unlikely to work if this is not the case. It is likely to work if you used the upstream steps in the RDP pipeline to process your raw (barcoded, untrimmed) fasta/fastq data.

This function first loops through the ".clust" file and collects all of the sample names that appear. It secondly loops through each OTU ("cluster"; each row of the cluster file) and sums the number of sequences (reads) from each sample. The resulting abundance table of OTU-by-sample is trivially coerced to an otu_table object, and returned.

Value

An otu_table object parsed from the ".clust" file.

References

http://pyro.cme.msu.edu/index.jsp


phyloseq documentation built on Nov. 8, 2020, 6:41 p.m.