03-findClusteNumber: Using SillyPutty to find the number of clusters

findClusterNumberR Documentation

Using SillyPutty to find the number of clusters

Description

A function that is designed to find an approximation of the true number. K, of clusters in a dataset. the findClusterNumber function calls RandomSillyPutty for each value of K in the range from start to end, performing N random starts each time.

NOTE: start must be > 1, and the function can be slow depending on how complex the dataset is and the number of N iterations.

Usage

  findClusterNumber(distobj, start,end, N = 100,
                    method = c("SillyPutty", "HCSP"), ...)

Arguments

distobj

An object of class dist representing a distance matrix.

start

The minimum cluster number for the range of clusters

end

The maximum cluster number for the range of clusters

N

Number of iterations

method

whether to use the full RandomSillyPutty algorithm or use the hybrid method of hierarchical clustering followed by SillyPutty.

...

Extra arguments to the SillyPutty function.

Details

The findClusterNumber function processes one distance matrix at a time, through N iterations. It returns a list. The list is a list of the maximum silhoutte width values obtained from N iterations with their associated cluster number.

Value

A list containing the maximum silhouette width values per K clusters for each K in the range of possible cluster numbers.

Author(s)

Kevin R. Coombes krc@silicovore.com, Dwayne G. Tally dtally110@hotmail.com

References

Pending.

Examples

data(eucdist)
set.seed(12)
y <- findClusterNumber(eucdist, start = 3, end = 7, method = "HCSP")
plot(names(y), y, xlab = "K", ylab = "Mean Silhouette Width",
     type = "b", lwd = 2, pch = 16)

SillyPutty documentation built on Feb. 8, 2024, 3 a.m.