read.uc: Read file in USEARCH cluster format

View source: R/read.uc.R

read.ucR Documentation

Read file in USEARCH cluster format

Description

Read a file in USEARCH cluster format generated by either USEARCH or VSEARCH.

Usage

read.uc(uc.file)

Arguments

uc.file

path to file in USEARCH cluster format (*.uc file extension).

Value

A dataframe storing the following columns:

  • Type: Record type 'S', 'H', 'C', or 'N'.

  • Cluster: Cluster number (0-based).

  • Size: Sequence length ('S', 'N', and 'H') or cluster size 'C'.

  • Perc_Ident: For 'H' records, percent identity with target.

  • Strand: For 'H' records, the strand: '+' or '-' for nucleotides; '.' for proteins.

  • Query: query id.

  • Target: target id.

Details:

Record type:

  • Type 'H' : Hit. Represents an alignment between the query sequence and target sequence. For clustering 'H' indicates the cluster assignment for the query.

  • Type 'S' : Centroid (clustering only). There exists only one 'S' record for each cluster, this gives the centroid (representative) sequence label in the Query column.

  • Type 'C' : Cluster record (clustering only). The Size column specifies the cluster size and the Query column the query id that corresponds to this cluster.

  • Type 'N' : No hit (for database search without clustering only). Indicates that no hit of the query were found in the target database. In the case of clustering, a query without hits becomes the centroid of a new cluster and generates an 'S' record instead of an 'N' record.

Author(s)

Hajk-Georg Drost

Examples

# read example *.uc file
test.uc <- read.uc(system.file("test.uc", package = "LTRpred"))

# look at the format in R
head(test.uc)

HajkD/LTRpred documentation built on April 22, 2022, 4:35 p.m.