clarans: K-medoids clustering of SNPs using randomized search

Description Usage Arguments Details References Examples

View source: R/snpCluster.R

Description

Partitioning (clustering) into k clusters "around medoids" by randomized search. 1-abs(cor) is used as distance between SNPs.

Usage

1
clarans(snp, k, maxNeigbours = 100, nLocal = 10, mc.cores = 1)

Arguments

snp

an object of class snpMatrix.

k

a positive integer specifying the number of clusters, has to be greater than one and less than the number of SNPs.

maxNeigbours

a positive integer specifying the maximum number of randomized searches.

nLocal

a positive integer specifying the number of optimisation runs.

mc.cores

a positive integer for the number of cores for parallel computing. See mclapply for details.

Details

The K-medoids clustering is implemented as clustering large applications based upon randomized search (CLARANS) algorithm (Ng and Han 2002). CLARANS is a modification of the partitioning around medoids (PAM) algorithm pam. Where the PAM algorithm is estimating all distances between SNPs and the respective medoids, CLARANS is searching a random subset of the SNPs. This is independently repeated several times and the result which minimises the average distance the most is reported. This produces results close to those of the PAM algorithm (Ng and Han 2002), though the number of runs and the subset size have to be arbitrarily chosen by the user. The algorithm has two advantages: (i) the number of distance comparisons is dramatically reduced; and (ii) parallelizing is straightforward.

References

Ng and J. Han (2002). CLARANS: A method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering. http://dx.doi.org/10.1109/TKDE.2002.1033770).

Examples

1
2
3
4
5
# file containing example data for SNP data
gfile <- system.file("extdata/snpdata.csv", package = "qtcat")
snp <- read.snpData(gfile, sep = ",")

clust <- clarans(snp, 3)

QTCAT/qtcat documentation built on April 20, 2021, 11:20 p.m.