readUc: Convert .uc Files to Dataframe
In packFinder: de novo Annotation of Pack-TYPE Transposable Elements

Description Usage Arguments Details Value Author(s) References See Also Examples

Reads .uc files (USEARCH Cluster Format) generated by the VSEARCH clustering and alignment algorithms.

1	readUc(file, output = "cluster")

file

The file path of the .uc file.

output

The type of analysis that was carried out to produce the .uc file.

If output is specified as "cluster", VSEARCH clustering was carried out.
If output is specified as "alignment", VSEARCH pairwise global alignment was carried out.

Note that clustering produces one "H" record for each sequence, and one "C" record for each cluster, while an alignment produces an "H" record for each alignment (see details).

USEARCH cluster format is a tab separated text file that contains clustering and/or alignment information for a set of sequences. For each sequence a record type, "H, C or N", is provided providing information about the type of "hit" in the dataframe. These refer to:

H - Hit - for alignments, indicates an identified alignment of two supplied sequences. For clustering, indicates the cluster assignment for a query.
C - Cluster record - a record for each cluster generated.
N - No hit - indicates that no cluster was assigned or no alignment was found with a target sequence. For clustering, a query with no hits becomes the centroid of a new cluster.

Additionally, for each record a "compressed alignment" is generated. This is the alignment represented in a compact format including the letters "M", "D", and "I". Before each letter, the number of consecutive columns of the given letter type is also given. The letter types are as follows:

"M" - Match - Identical bases between the query and target sequence
"D" - Deletion - A gap in the target sequence
"I" - Insertion - A gap in the query sequence

An example of this would be "13M", referring to 13 consecutive matches between the query and target sequence.

A dataframe containing the converted .uc file. The fields contained within are as follows:

Record type - "H, C or N", see details for further information.
Cluster designation (output = "cluster" only)
Sequence length, or cluster size
Percent identity to target
The nucleotide strand (output = "cluster" only)
A compressed alignment - see details for further information.
ID of query sequence
ID of target sequence ("H" records only)

Jack Gisby

VSEARCH may be downloaded from https://github.com/torognes/vsearch. See https://www.ncbi.nlm.nih.gov/pubmed/27781170 for further information.

codetirClust, codepackAlign, codereadBlast, codepackClust

readUc(system.file(
    "extdata", 
    "packMatches.uc", 
    package = "packFinder"
))

packFinder documentation built on Nov. 8, 2020, 5:24 p.m.

packFinder index

README.md packFinder

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

packFinder
de novo Annotation of Pack-TYPE Transposable Elements

readUc: Convert .uc Files to Dataframe
In packFinder: de novo Annotation of Pack-TYPE Transposable Elements

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to readUc in packFinder...

R Package Documentation

Browse R Packages

We want your feedback!

packFinder de novo Annotation of Pack-TYPE Transposable Elements

readUc: Convert .uc Files to Dataframe In packFinder: de novo Annotation of Pack-TYPE Transposable Elements

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to readUc in packFinder...

R Package Documentation

Browse R Packages

We want your feedback!

packFinder
de novo Annotation of Pack-TYPE Transposable Elements

readUc: Convert .uc Files to Dataframe
In packFinder: de novo Annotation of Pack-TYPE Transposable Elements