Import usearch table format (.uc) to OTU table


UPARSE is an algorithm for OTU-clustering implemented within usearch. At last check, the UPARSE algortihm was accessed via the -cluster_otu option flag. For details about installing and running usearch, please refer to the usearch website. For details about the output format, please refer to the uc format definition. This importer is intended to read a particular table format output that is generated by usearch, its so-called “cluster format”, a file format that is often given the .uc extension in usearch documentation.


import_usearch_uc(ucfile, colRead = 9, colOTU = 10, readDelimiter = "_",
  verbose = TRUE)



(Required). A file location character string or connection corresponding to the file that contains the usearch output table. This is passed directly to read.table. Please see its file argument documentation for further links and details.


(Optional). Numeric. The column index in the uc-table file that holds the read IDs. The default column index is 9.


(Optional). Numeric. The column index in the uc-table file that holds OTU IDs. The default column index is 10.


(Optional). An R regex as a character string. This should be the delimiter that separates the sample ID from the original ID in the demultiplexed read ID of your sequence file. The default is plain underscore, which in this regex context is "_".


(Optional). A logical. Default is TRUE. Should progresss messages be catted to standard out?


Because usearch is an external (non-R) application, there is no direct way to continuously check that these suggested arguments and file formats will remain in their current state. If there is a problem, please verify your version of usearch, create a small reproducible example of the problem, and post it as an issue on the phyloseq issues tracker. The version of usearch upon which this import function was created is 7.0.109. Hopefully later versions of usearch maintain this function and format, but the phyloseq team has no way to guarantee this, and so any feedback about this will help maintain future functionality. For instance, it is currently assumed that the 9th and 10th columns of the .uc table hold the read-label and OTU ID, respectively; and it is also assumed that the delimiter between sample-name and read in the read-name entries is a single "_". If this is not true, you may have to update these parameters, or even modify the current implementation of this function.

Also note that there is now a UPARSE-specific output file format, uparseout, and it might make more sense to create and import that file for use in phyloseq. If so, you'll want to import using the import_uparse() function.

See Also





usearchfile <- system.file("extdata", "usearch.uc", package="phyloseq")

Want to suggest features or report bugs for Use the GitHub issue tracker. Vote for new features on Trello.

comments powered by Disqus