OptimClusts: Optimal Cluster Calculator

Description Usage Arguments Details Value Author(s) References Examples

Description

Given the average silhouette width obtained using partitioning around medoids(PAM) method, this function determines the optimal number of clusters to be used by calculating the maximum average silhouette width. The absolute maximum silhouette width is not a representative of the optimal number of clusters. OptimClusts calculates the optimal number as the smallest value such that the silhouette width at that value is a local maxima, and is within a neighbourhood of the global maxima.

Usage

1
OptimClusts(P, Eps)

Arguments

P

Vector of average silhouette widths calculated for a specified number of clusters.

Eps

A numerical value between 0 and 1 which determines the neighbourhood of the global maximum within which to search for a local maxima. It is advised to use values smaller than 10 %.

Details

The function OptimClusts uses the mPAM (modified PAM) algorithm described in the first reference below. For a data set with N samples (or taxa/OTUs when clustering taxa/OTUs), the value of K to be used to avoid overestimation of clusters is ≤ft[ 2√{N} \right], where ≤ft[x \right] is the largest integer smaller than x.

Value

An integer value between 1 and K, where K is the length of the silhouette vector P. If the minimum and maximum number of clusters specified are m and M respectively, the value represents the index of the optimal number of clusters to be used in the vector (m, M). See Details for information on the maximum number of clusters.

Author(s)

Shili Lin<shili@stat.osu.edu>

References

Ayyala, D. N., Lin, S., (2015) GrammR: graphical representation and modeling of count data with application in metagenomics, Bioinformatics, 31(10).

Peter J. Rousseeuw (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, 20.

Examples

1
2
3
4
x <- c(0.5, 0.1, 0.6, 0.7, 0.8, 0.75, 0.77, 0.79, 0.81, 0.9)
## Not run: plot(2:10, x)
OptimClusts(x, 0.1) ## The optimal number selected is 6.
OptimClusts(x, 0.05) ## The optimal number selected is 10.

GrammR documentation built on May 1, 2019, 8:46 p.m.