seq_cluster: Cluster sequences by similarity
In bioseq: A Toolbox for Manipulating Biological Sequences

View source: R/seq_summarize_operations.R

seq_cluster

R Documentation

Cluster sequences by similarity

Description

Cluster sequences by similarity

Usage

seq_cluster(x, threshold = 0.05, method = "complete")

Arguments

`x`	a DNA, RNA or AA vector of sequences to clustered.
`threshold`	Threshold value (range in [0, 1]).
`method`	the clustering method (see details).

Details

The function uses ape dist.dna and dist.aa functions to compute pairwise distances among sequences and hclust for clustering.

Computing a full pairwise diastance matrix can be computationally expensive. It is recommended to use this function for moderate size dataset.

Supported methods are:

"single" (= Nearest Neighbour Clustering)
"complete" (= Farthest Neighbour Clustering)
"average" (= UPGMA)
"mcquitty" (= WPGMA)

Value

An integer vector with group memberships.

Examples


x <- c("-----TACGCAGTAAAAGCTACTGATG",
       "CGTCATACGCAGTAAAAACTACTGATG",
       "CTTCATACGCAGTAAAAACTACTGATG",
       "CTTCATATGCAGTAAAAACTACTGATG",
       "CTTCATACGCAGTAAAAACTACTGATG",
       "CGTCATACGCAGTAAAAGCTACTGATG",
       "CTTCATATGCAGTAAAAGCTACTGACG")
x <- dna(x)
seq_cluster(x)

bioseq documentation built on Sept. 6, 2022, 5:07 p.m.