SH.IDX | R Documentation |
Computes the SH (Rousseeuw, 1987; Kaufman and Rousseeuw, 2009) index for a result either kmeans or hierarchical clustering from user specified kmin
to kmax
.
SH.IDX(x, kmax, kmin = 2, method = "kmeans", nstart = 100)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
kmin |
a minimum number of clusters to be considered. The default is |
method |
a character string indicating which clustering method to be used ( |
nstart |
a maximum number of initial random sets for kmeans for |
For i \in [n]
, l \in [k]
, and x_i \in C_l
, let
a(i) = \dfrac{1}{|C_l|-1}\sum_{y \in C_l} \left\|x_i-y\right\| and
b(i) = \min_{r \neq l} \dfrac{1}{|C_r|} \sum_{y \in C_r} \left\|x_i-y\right\|.
The silhouette value of one data point x_j
is defined as:
s(j) =
\begin{cases}
\dfrac{b(j) - a(j)}{\max\{a(j),b(i)\}} &\text{ \ \ if \ } |C_j| > 1 \\
0 &\text{ \ \ if \ } |C_j| = 1
\end{cases}.
The silhouette index is defined as
SH(k) = \dfrac{1}{n} \sum_{i = 1}^n s(i).
The largest value of SH(k)
indicates a valid optimal partition.
SH |
the SH index for |
Nathakhun Wiroonsri and Onthada Preedasawakul
Rousseeuw, P.J., 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65.
Kaufman, L. and Rousseeuw, P.J., 2009. Finding groups in data: an introduction to cluster analysis. John Wiley & Sons.
Hvalid, Wvalid, DI.IDX, FzzyCVIs, R1_data
library(UniversalCVI)
# The data is from Wiroonsri (2024).
x = R1_data[,1:2]
# ---- Hierarchical ----
# Average linkage
# Compute the SH index
H.SH = SH.IDX(scale(x), kmax = 10, kmin = 2, method = "hclust_average", nstart = 1)
print(H.SH)
# The optimal number of cluster
H.SH[which.max(H.SH$SH),]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.