SH.IDX: Silhouette index
In UniversalCVI: Hard and Soft Cluster Validity Indices

SH.IDX

R Documentation

Silhouette index

Description

Computes the SH (Rousseeuw, 1987; Kaufman and Rousseeuw, 2009) index for a result either kmeans or hierarchical clustering from user specified kmin to kmax.

Usage

SH.IDX(x, kmax, kmin = 2, method = "kmeans", nstart = 100)

Arguments

`x`	a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point.
`kmax`	a maximum number of clusters to be considered.
`kmin`	a minimum number of clusters to be considered. The default is `2`.
`method`	a character string indicating which clustering method to be used (`"kmeans"`, `"hclust_complete"`, `"hclust_average"`, `"hclust_single"`). The default is `"kmeans"`.
`nstart`	a maximum number of initial random sets for kmeans for `method = "kmeans"`. The default is `100`.

Details

For i \in [n], l \in [k], and x_i \in C_l, let

a(i) = \dfrac{1}{|C_l|-1}\sum_{y \in C_l} \left\|x_i-y\right\| and

b(i) = \min_{r \neq l} \dfrac{1}{|C_r|} \sum_{y \in C_r} \left\|x_i-y\right\|.

The silhouette value of one data point x_j is defined as:

s(j) = \begin{cases} \dfrac{b(j) - a(j)}{\max\{a(j),b(i)\}} &\text{ \ \ if \ } |C_j| > 1 \\ 0 &\text{ \ \ if \ } |C_j| = 1 \end{cases}.

The silhouette index is defined as

SH(k) = \dfrac{1}{n} \sum_{i = 1}^n s(i).

The largest value of SH(k) indicates a valid optimal partition.

Value

`SH`	the SH index for `k` from `kmin` to `kmax` shown in a data frame where the first and the second columns are `k` and the SH index, respectively.

Author(s)

Nathakhun Wiroonsri and Onthada Preedasawakul

References

Rousseeuw, P.J., 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65.

Kaufman, L. and Rousseeuw, P.J., 2009. Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.

Examples


library(UniversalCVI)

# The data is from Wiroonsri (2024).
x = R1_data[,1:2]

# ---- Hierarchical ----

# Average linkage

# Compute the SH index
H.SH = SH.IDX(scale(x), kmax = 10, kmin = 2, method = "hclust_average", nstart = 1)
print(H.SH)

# The optimal number of cluster
H.SH[which.max(H.SH$SH),]

UniversalCVI documentation built on April 3, 2025, 7:50 p.m.