SH.IDX: Silhouette index

SH.IDXR Documentation

Silhouette index

Description

Computes the SH (Rousseeuw, 1987; Kaufman and Rousseeuw, 2009) index for a result either kmeans or hierarchical clustering from user specified kmin to kmax.

Usage

SH.IDX(x, kmax, kmin = 2, method = "kmeans", nstart = 100)

Arguments

x

a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point.

kmax

a maximum number of clusters to be considered.

kmin

a minimum number of clusters to be considered. The default is 2.

method

a character string indicating which clustering method to be used ("kmeans", "hclust_complete", "hclust_average", "hclust_single"). The default is "kmeans".

nstart

a maximum number of initial random sets for kmeans for method = "kmeans". The default is 100.

Details

For i \in [n], l \in [k], and x_i \in C_l, let

a(i) = \dfrac{1}{|C_l|-1}\sum_{y \in C_l} \left\|x_i-y\right\| and

b(i) = \min_{r \neq l} \dfrac{1}{|C_r|} \sum_{y \in C_r} \left\|x_i-y\right\|.

The silhouette value of one data point x_j is defined as:

s(j) = \begin{cases} \dfrac{b(j) - a(j)}{\max\{a(j),b(i)\}} &\text{ \ \ if \ } |C_j| > 1 \\ 0 &\text{ \ \ if \ } |C_j| = 1 \end{cases}.

The silhouette index is defined as

SH(k) = \dfrac{1}{n} \sum_{i = 1}^n s(i).

The largest value of SH(k) indicates a valid optimal partition.

Value

SH

the SH index for k from kmin to kmax shown in a data frame where the first and the second columns are k and the SH index, respectively.

Author(s)

Nathakhun Wiroonsri and Onthada Preedasawakul

References

Rousseeuw, P.J., 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65.

Kaufman, L. and Rousseeuw, P.J., 2009. Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.

See Also

Hvalid, Wvalid, DI.IDX, FzzyCVIs, R1_data

Examples


library(UniversalCVI)

# The data is from Wiroonsri (2024).
x = R1_data[,1:2]

# ---- Hierarchical ----

# Average linkage

# Compute the SH index
H.SH = SH.IDX(scale(x), kmax = 10, kmin = 2, method = "hclust_average", nstart = 1)
print(H.SH)

# The optimal number of cluster
H.SH[which.max(H.SH$SH),]

UniversalCVI documentation built on April 3, 2025, 7:50 p.m.