simRank: Compute the SimRank Similarity between Sets of Sequences

View source: R/simRank.R

simRankR Documentation

Compute the SimRank Similarity between Sets of Sequences

Description

Computes the SimRank similarity (number of shared unique k-mers over the smallest number of unique k-mers.)

Usage

simRank(x, k = 7)

Arguments

x

an object of class DNAStringSet containing the sequences.

k

size of used k-mers.

Details

distSimRank() returns 1-simRank().

Value

simRank() returns a similarity object of class "simil" (see proxy). distSimRank() returns a dist object.

Author(s)

Michael Hahsler

References

Santis et al, Simrank: Rapid and sensitive general-purpose k-mer search tool, BMC Ecology 2011, 11:11

Examples

### load sequences
sequences <- readDNAStringSet(system.file("examples/DNA_example.fasta",
	package="rMSA"))
sequences

### compute similarity
simil <- simRank(sequences)

### use hierarchical clustering
hc <- hclust(distSimRank(sequences))
plot(hc)

mhahsler/rMSA documentation built on May 24, 2024, 3:36 p.m.