readUc | R Documentation |
Reads .uc files (USEARCH Cluster Format) generated by the VSEARCH clustering and alignment algorithms.
readUc(file, output = "cluster")
file |
The file path of the .uc file. |
output |
The type of analysis that was carried out to produce the .uc file.
Note that clustering produces one "H" record for each sequence, and one "C" record for each cluster, while an alignment produces an "H" record for each alignment (see details). |
USEARCH cluster format is a tab separated text file that contains clustering and/or alignment information for a set of sequences. For each sequence a record type, "H, C or N", is provided providing information about the type of "hit" in the dataframe. These refer to:
H - Hit - for alignments, indicates an identified alignment of two supplied sequences. For clustering, indicates the cluster assignment for a query.
C - Cluster record - a record for each cluster generated.
N - No hit - indicates that no cluster was assigned or no alignment was found with a target sequence. For clustering, a query with no hits becomes the centroid of a new cluster.
Additionally, for each record a "compressed alignment" is generated. This is the alignment represented in a compact format including the letters "M", "D", and "I". Before each letter, the number of consecutive columns of the given letter type is also given. The letter types are as follows:
"M" - Match - Identical bases between the query and target sequence
"D" - Deletion - A gap in the target sequence
"I" - Insertion - A gap in the query sequence
An example of this would be "13M", referring to 13 consecutive matches between the query and target sequence.
A dataframe containing the converted .uc file. The fields contained within are as follows:
Record type - "H, C or N", see details for further information.
Cluster designation
(output = "cluster"
only)
Sequence length, or cluster size
Percent identity to target
The nucleotide strand
(output = "cluster"
only)
A compressed alignment - see details for further information.
ID of query sequence
ID of target sequence ("H" records only)
Jack Gisby
VSEARCH may be downloaded from https://github.com/torognes/vsearch. See https://www.ncbi.nlm.nih.gov/pubmed/27781170 for further information.
codetirClust, codepackAlign, codereadBlast, codepackClust
readUc(system.file( "extdata", "packMatches.uc", package = "packFinder" ))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.