readUc: Convert .uc Files to Dataframe

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/readUc.R

Description

Reads .uc files (USEARCH Cluster Format) generated by the VSEARCH clustering and alignment algorithms.

Usage

1
readUc(file, output = "cluster")

Arguments

file

The file path of the .uc file.

output

The type of analysis that was carried out to produce the .uc file.

  • If output is specified as "cluster", VSEARCH clustering was carried out.

  • If output is specified as "alignment", VSEARCH pairwise global alignment was carried out.

Note that clustering produces one "H" record for each sequence, and one "C" record for each cluster, while an alignment produces an "H" record for each alignment (see details).

Details

USEARCH cluster format is a tab separated text file that contains clustering and/or alignment information for a set of sequences. For each sequence a record type, "H, C or N", is provided providing information about the type of "hit" in the dataframe. These refer to:

Additionally, for each record a "compressed alignment" is generated. This is the alignment represented in a compact format including the letters "M", "D", and "I". Before each letter, the number of consecutive columns of the given letter type is also given. The letter types are as follows:

An example of this would be "13M", referring to 13 consecutive matches between the query and target sequence.

Value

A dataframe containing the converted .uc file. The fields contained within are as follows:

Author(s)

Jack Gisby

References

VSEARCH may be downloaded from https://github.com/torognes/vsearch. See https://www.ncbi.nlm.nih.gov/pubmed/27781170 for further information.

See Also

codetirClust, codepackAlign, codereadBlast, codepackClust

Examples

1
2
3
4
5
readUc(system.file(
    "extdata", 
    "packMatches.uc", 
    package = "packFinder"
))

packFinder documentation built on Nov. 8, 2020, 5:24 p.m.