Description Usage Arguments Value Examples
Takes a set of k-mer sequences and returns a list of partitioning the input k-mers into clusters of more similar k-mers. Hierarchical clustering (average linkage) is performed based on Jaccard coefficient distance metric applied treating each k-mer as the set of all tetramers which can be found as substrings within it.
1 2 | clusterKmers(kmers, k = 4, nClusters = NULL, maxClusters = NULL,
directional = TRUE)
|
kmers |
character vector or XStringSet of k-mers to partition into clusters |
k |
length of sub-k-mers (default k=4 to use tetramers) with which to calculate Jaccard distances for clustering |
nClusters |
number of clusters to partition kmers into; if set to NULL (default value), selects number of clusters to maximize the average silhouette score (https://en.wikipedia.org/wiki/Silhouette_(clustering)). |
maxClusters |
if nClusters not specified, can optionally set maximum number of clusters allowed in silhouette score optimization. |
directional |
logical value: if FALSE, considers each kmer as equivalent to its reverse-complement. Makes sense only if applying to DNA sequences! |
list of character vectors (or XStringSet objects as per the class of kmers argument) partitioning kmers into clusters: the character vector at the i-th element of the output list contains the elements from kmers assigned to cluster i.
1 2 3 4 5 6 | kmers <- c(
'CAGCCTGG', 'CCTGGAA', 'CAGCCTG', 'CCTGGAAC', 'CTGGAACT',
'ACCTGC', 'CACCTGC', 'TGGCCTG', 'CACCTG', 'TCCAGC',
'CTGGAAC', 'CACCTGG', 'CTGGTCTA', 'GTCCTG', 'CTGGAAG', 'TTCCAGC'
)
clusterKmers(kmers, directional=FALSE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.