seq_cluster: Cluster sequences by similarity

View source: R/seq_summarize_operations.R

seq_clusterR Documentation

Cluster sequences by similarity

Description

Cluster sequences by similarity

Usage

seq_cluster(x, threshold = 0.05, method = "complete")

Arguments

x

a DNA, RNA or AA vector of sequences to clustered.

threshold

Threshold value (range in [0, 1]).

method

the clustering method (see details).

Details

The function uses ape dist.dna and dist.aa functions to compute pairwise distances among sequences and hclust for clustering.

Computing a full pairwise diastance matrix can be computationally expensive. It is recommended to use this function for moderate size dataset.

Supported methods are:

  • "single" (= Nearest Neighbour Clustering)

  • "complete" (= Farthest Neighbour Clustering)

  • "average" (= UPGMA)

  • "mcquitty" (= WPGMA)

Value

An integer vector with group memberships.

See Also

Function seq_consensus to compute consensus and representative sequences for clusters.

Other aggregation operations: seq_consensus()

Examples


x <- c("-----TACGCAGTAAAAGCTACTGATG",
       "CGTCATACGCAGTAAAAACTACTGATG",
       "CTTCATACGCAGTAAAAACTACTGATG",
       "CTTCATATGCAGTAAAAACTACTGATG",
       "CTTCATACGCAGTAAAAACTACTGATG",
       "CGTCATACGCAGTAAAAGCTACTGATG",
       "CTTCATATGCAGTAAAAGCTACTGACG")
x <- dna(x)
seq_cluster(x)


bioseq documentation built on Sept. 6, 2022, 5:07 p.m.