si: Silhouette Index

View source: R/si.R

siR Documentation

Silhouette Index

Description

Computes the hard and fuzzy Silhouette Index (Rousseeuw, 1987; Campello & Hruschka, 2006) in order to validate the result of a cluster analysis.

Usage

si(x, u, v, m, t = NULL, eta, av = 1, tidx = "f")

Arguments

x

an object of class ‘ppclust’ containing the clustering results from a fuzzy clustering algorithm in the package ppclust. Alternatively, a numeric data frame or matrix containing the data set.

u

a numeric data frame or matrix containing the fuzzy membership values. It should be specified if x is not an object of ‘ppclust’.

v

a numeric data frame or matrix containing the cluster prototypes. It should be specified if x is not an object of ‘ppclust’.

t

a numeric data frame or matrix containing the cluster prototypes. It should be specified if x is not an object of ‘ppclust’ and the option e or g is assigned to tidx.

m

a number specifying the fuzzy exponent. It should be specified if x is not an object of ‘ppclust’.

eta

a number specifying the typicality exponent. It should be specified if x is not an object of ‘ppclust’ and tidx is either e or g.

av

a number specifying the exponent α which is a user-defined value. The default is 1.

tidx

a character specifying the type of index. The default is ‘f’ for fuzzy index. The other options are ‘e’ for extended and ‘g’ for generalized index.

Details

The Silhouette Index (SI) values are the estimates of average silhouette widths. Silhouette width for each object is calculated as follows:

s_i = (b_i-a_i)/max(b_i, a_i)

a_i is the average distance between the object i and the other objects of the cluster of the object i. d(i, C_j) is the average distance of the object i to the objects locate in other clusters and b_i is the smallest of all of these distances.

Silhouette width values lie between -1 and 1. The well clustered objects which are closer to the center of the clusters have the higher s_i values. Contrarily, the objects with smaller s_i locate between the clusters. Negative s_i means that the object locates in the wrong cluster.

The average of the silhouette widths of any cluster is called the average cluster silhouette width and obtained as follows:

\bar{s_j} = \frac{1}{n_j} ∑\limits_{i=1}^{n_j} s_i

After calculation of average silhouette widths of the clusters, the total average of these is calculated as follows and used as the Silhouette index.

I_{SI} = \frac{1}{k} ∑\limits_{j=1}^k \bar{s_j}

For fuzzy version version of the silhouette index is calculated as follows:

I_{SI} = \frac{∑\limits_{i=1}^n (u_{ij}-u_{lj})^α \; s_i}{∑\limits_{i=1}^n (u_{ij}-u_{lj})^α}

where s_i is the silhouette of object i, u_{ij} and u_{lj} are the first and second largest elements of the j-th column of the fuzzy membership matrix, and α ≥q 0 is a weighting exponent. When it approaches zero, the fuzzy measure of I_{SI} approaches to the hard measures of it (Campello & Hruschka, 2006). For extended and generalized values of the index, the function si is a modified and combined version the SIL and SIL.F of the package ‘fclust’ (Ferraro & Giordani, 2015).

Value

si.obj

silhouette widths of the objects

sih

hard SI value

sif

fuzzy SI value

Author(s)

Zeynel Cebeci

References

Rousseeuw, P. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Computational and Applied Mathematics, 20, 53:65. <doi:10.1016/0377-0427(87)90125-7>

Campello R.J.G.B. & Hruschka E.R. (2006). A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets and Systems, 157 (21):2858-2875. <doi:10.1016/j.fss.2006.07.006>

Ferraro, M.B. & Giordani, P. (2015) A toolbox for fuzzy clustering using the R programming language, Fuzzy Sets and Systems, 279:1-16. <doi:10.1016/j.fss.2015.05.001>

See Also

allindexes, apd, cl, cs, cwb, fhv, fs, kpbm, kwon, mcd, mpc, pbm, pc, pe, sc, tss, ws, xb

Examples

# Load the dataset iris
data(iris)
x <- iris[,1:4]

# Run FCM algorithm in the package ppclust 
res.fcm <- ppclust::fcm(x, centers=3)

# Compute the SI using res.fcm, which is a ppclust object
idx <- si(res.fcm)
print(idx)
 
# Compute the SI using X, U and V matrices
idx <- si(res.fcm$x, res.fcm$u, res.fcm$v)
print(idx)

zcebeci/fcvalid documentation built on Oct. 4, 2022, 9:01 p.m.