Description Usage Arguments Details Value Note Author(s) See Also Examples
View source: R/assoc_repseq_IDs_with_otus_by_fasta.R
This function parses representative sequence IDs and makes a table associating the representative sequence machine names with the OTU names as given by RDP's cluster file formatter with options "R" or "biom," or function clstr2otu in this package.
1 | assoc_repseq_IDs_with_otus_by_fasta(repseq_file="all_seq_complete.clust_rep_seqs.fasta", otu_format="R")
|
repseq_file |
The name of the fasta file containing representative sequences. |
otu_format |
When equal to "R" (default) OTU names have the form "OTUxxxnn." When equal to "biom", OTU names have the form "cluster_nn." |
Representative sequences from clusters for a given distance may be obtained with either the web-based representative sequence tool currently on the rdpipeline page (http://pyro.cme.msu.edu/), or with the RDPTools' cluster function using a command similar to:
java -Xmx2g -jar $Clustering rep-seqs –one-rep-per-otu all_seq_complete.clust 0.03 merged_aligned.fasta
In these cases the fasta headers contain information on the cluster number and the size of the cluster. This function parses this information into a table that can be used as input to function rename_fasta, which renames the representative sequences with their corresponding OTU names.
This function returns a data frame with 4 columns: the machine name of the representative sequence, the corresponding OTU name as given by RDP's cluster file formatter with options "R" or "biom" and by function clstr2otu in this package, the cluster number, and the total number of sequences in the cluster (cluster size).
The representative sequence tool on the FunGene pipeline page (http://fungene.cme.msu.edu/FunGenePipeline/) returns one representative sequences per sample, a format which is not compatible with this function.
This function expects the representative sequence IDs to be formatted in one of these ways:
>HC9DO0P01BCTC4 prefered=false,cluster=0,clustsize=2
>HC9DO0P01BCTC4 cluster_id=1,size=2
If the representative sequence IDs are not formatted as in these examples, or do not contain information on cluster number and size, a similar association table may be made using function assoc_repseq_IDs_with_otus_by_clstr.
John Quensen
assoc_repseq_IDs_with_otus_by_clstr
, clstr2otu
, rename_fasta
1 2 3 | repseq.file <- system.file("extdata", "all_seq_complete.clust_rep_seqs.fasta", package="RDPutils")
assoc.table <- assoc_repseq_IDs_with_otus_by_fasta(repseq_file=repseq.file)
head(assoc.table)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.