assoc_repseq_IDs_with_otus_by_fasta: Associate Representative Sequences with OTUs from Fasta...

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/assoc_repseq_IDs_with_otus_by_fasta.R

Description

This function parses representative sequence IDs and makes a table associating the representative sequence machine names with the OTU names as given by RDP's cluster file formatter with options "R" or "biom," or function clstr2otu in this package.

Usage

1
assoc_repseq_IDs_with_otus_by_fasta(repseq_file="all_seq_complete.clust_rep_seqs.fasta", otu_format="R")

Arguments

repseq_file

The name of the fasta file containing representative sequences.

otu_format

When equal to "R" (default) OTU names have the form "OTUxxxnn." When equal to "biom", OTU names have the form "cluster_nn."

Details

Representative sequences from clusters for a given distance may be obtained with either the web-based representative sequence tool currently on the rdpipeline page (http://pyro.cme.msu.edu/), or with the RDPTools' cluster function using a command similar to:

java -Xmx2g -jar $Clustering rep-seqs –one-rep-per-otu all_seq_complete.clust 0.03 merged_aligned.fasta

In these cases the fasta headers contain information on the cluster number and the size of the cluster. This function parses this information into a table that can be used as input to function rename_fasta, which renames the representative sequences with their corresponding OTU names.

Value

This function returns a data frame with 4 columns: the machine name of the representative sequence, the corresponding OTU name as given by RDP's cluster file formatter with options "R" or "biom" and by function clstr2otu in this package, the cluster number, and the total number of sequences in the cluster (cluster size).

Note

The representative sequence tool on the FunGene pipeline page (http://fungene.cme.msu.edu/FunGenePipeline/) returns one representative sequences per sample, a format which is not compatible with this function.

This function expects the representative sequence IDs to be formatted in one of these ways:

>HC9DO0P01BCTC4 prefered=false,cluster=0,clustsize=2

>HC9DO0P01BCTC4 cluster_id=1,size=2

If the representative sequence IDs are not formatted as in these examples, or do not contain information on cluster number and size, a similar association table may be made using function assoc_repseq_IDs_with_otus_by_clstr.

Author(s)

John Quensen

See Also

assoc_repseq_IDs_with_otus_by_clstr, clstr2otu, rename_fasta

Examples

1
2
3
repseq.file <- system.file("extdata", "all_seq_complete.clust_rep_seqs.fasta", package="RDPutils")
assoc.table <- assoc_repseq_IDs_with_otus_by_fasta(repseq_file=repseq.file)
head(assoc.table)

jfq3/RDPutils documentation built on Nov. 8, 2019, 1:05 p.m.