si | R Documentation |
Computes the hard and fuzzy Silhouette Index (Rousseeuw, 1987; Campello & Hruschka, 2006) in order to validate the result of a cluster analysis.
si(x, u, v, m, t = NULL, eta, av = 1, tidx = "f")
x |
an object of class ‘ppclust’ containing the clustering results from a fuzzy clustering algorithm in the package ppclust. Alternatively, a numeric data frame or matrix containing the data set. |
u |
a numeric data frame or matrix containing the fuzzy membership values. It should be specified if |
v |
a numeric data frame or matrix containing the cluster prototypes. It should be specified if |
t |
a numeric data frame or matrix containing the cluster prototypes. It should be specified if |
m |
a number specifying the fuzzy exponent. It should be specified if |
eta |
a number specifying the typicality exponent. It should be specified if |
av |
a number specifying the exponent α which is a user-defined value. The default is 1. |
tidx |
a character specifying the type of index. The default is ‘f’ for fuzzy index. The other options are ‘e’ for extended and ‘g’ for generalized index. |
The Silhouette Index (SI) values are the estimates of average silhouette widths. Silhouette width for each object is calculated as follows:
s_i = (b_i-a_i)/max(b_i, a_i)
a_i is the average distance between the object i and the other objects of the cluster of the object i. d(i, C_j) is the average distance of the object i to the objects locate in other clusters and b_i is the smallest of all of these distances.
Silhouette width values lie between -1 and 1. The well clustered objects which are closer to the center of the clusters have the higher s_i values. Contrarily, the objects with smaller s_i locate between the clusters. Negative s_i means that the object locates in the wrong cluster.
The average of the silhouette widths of any cluster is called the average cluster silhouette width and obtained as follows:
\bar{s_j} = \frac{1}{n_j} ∑\limits_{i=1}^{n_j} s_i
After calculation of average silhouette widths of the clusters, the total average of these is calculated as follows and used as the Silhouette index.
I_{SI} = \frac{1}{k} ∑\limits_{j=1}^k \bar{s_j}
For fuzzy version version of the silhouette index is calculated as follows:
I_{SI} = \frac{∑\limits_{i=1}^n (u_{ij}-u_{lj})^α \; s_i}{∑\limits_{i=1}^n (u_{ij}-u_{lj})^α}
where s_i is the silhouette of object i, u_{ij} and u_{lj} are the first and second largest elements of the j-th column of the fuzzy membership matrix, and α ≥q 0 is a weighting exponent. When it approaches zero, the fuzzy measure of I_{SI} approaches to the hard measures of it (Campello & Hruschka, 2006). For extended and generalized values of the index, the function si
is a modified and combined version the SIL and SIL.F of the package ‘fclust’ (Ferraro & Giordani, 2015).
si.obj |
silhouette widths of the objects |
sih |
hard SI value |
sif |
fuzzy SI value |
Zeynel Cebeci
Rousseeuw, P. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Computational and Applied Mathematics, 20, 53:65. <doi:10.1016/0377-0427(87)90125-7>
Campello R.J.G.B. & Hruschka E.R. (2006). A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets and Systems, 157 (21):2858-2875. <doi:10.1016/j.fss.2006.07.006>
Ferraro, M.B. & Giordani, P. (2015) A toolbox for fuzzy clustering using the R programming language, Fuzzy Sets and Systems, 279:1-16. <doi:10.1016/j.fss.2015.05.001>
allindexes
,
apd
,
cl
,
cs
,
cwb
,
fhv
,
fs
,
kpbm
,
kwon
,
mcd
,
mpc
,
pbm
,
pc
,
pe
,
sc
,
tss
,
ws
,
xb
# Load the dataset iris data(iris) x <- iris[,1:4] # Run FCM algorithm in the package ppclust res.fcm <- ppclust::fcm(x, centers=3) # Compute the SI using res.fcm, which is a ppclust object idx <- si(res.fcm) print(idx) # Compute the SI using X, U and V matrices idx <- si(res.fcm$x, res.fcm$u, res.fcm$v) print(idx)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.