CLARA: CLARA clustering
In talegari/clusterfit: An interface for clustering in R

Description Usage Arguments Details Value Author(s) Examples

Implements CLARA clustering algorithm using pam

1 2	CLARA(x, k, nSamples = 5, sampleFrac = 0.1, swap = FALSE, pamonce = 0)

`x`	(numeric matrix or dist) data
`k`	(positive integer) Number of clusters
`nSamples`	(positive integer, default: 5) Number of random samples
`sampleFrac`	(positive fraction, default: 0.1) Fraction of observations in a sample
`swap`	(flag, default: FALSE) Whether PAM should involve swap phase
`pamonce`	(One among 0, 1, 2, default: 0) See pamonce argument in `pam`

CLARA implementation:

PAM clustering is computed on multiple random samples of observations.
For a given clustering/medoids, cost is defined as the average dissimilarity/distance between observations(entire dataset) from the nearest medoid.
A clustering/medoids corresponding to the clustering with minimum cost is chosen.

The PAM fitting on multiple subsets is parallelized with future.

A list with three compoments:

clustering: An integer vector indicating the cluster number with length equal to number of observations
medoidsIndex: An integer vector of indices of medoids
cost: average dissimilarity/distance between observations(entire dataset) from the nearest medoid

Srikanth Komala Sheshachala (sri.teach@gmail.com)

set.seed(1)
clara(dist(mtcars), k = 4, sampleFrac = 0.4, nSamples = 10)
set.seed(2)
clara(stats::dist(mtcars, method = "maximum"), k = 4, sampleFrac = 0.4, nSamples = 10)