To cite this vignette and package 'fcvalid', please use one of the following items that fits to your references list:

Cebeci, Z., Kavlak, A.T., & Yildiz, F. (2017). "Validation of fuzzy and possibilistic clustering results", in Proc. of 2017 IEEE International Artificial Intelligence and Data Processing Symposium (IDAP) pp. 1-7. doi: 10.1109/IDAP.2017.8090183.

or in BibTeX format:

@inproceedings{cebeci2017validation,
  title={Validation of fuzzy and possibilistic clustering results},
  author={Cebeci, Zeynel and Kavlak, Alper Tuna and Yildiz, Figen},
  booktitle={2017 International Artificial Intelligence and Data Processing Symposium (IDAP)},
  pages={1--7}
  year={2017},
  organization={IEEE},
  url={https://10.1109/IDAP.2017.8090183},
 }

Data set and Packages

Install and load the package fcvalid

In order to install the package fcvalid from the GitHub repository you should first install the devtools package from CRAN into your local system. Then you can install the package fcvalid using install_github of devtools package as shown with the R code chunks below:

if(!require(devtools)) {install.packages('devtools'); library(devtools)}
install_github("zcebeci/fcvalid")

If you would like to have a compiled version of the vignettes of the package try to install the package fcvalid using install_github with build_vignettes argument set to TRUE as shown below:

if(!require(devtools)) {install.packages('devtools'); library(devtools)}
 devtools::install_github("zcebeci/fcvalid", build_vignettes=TRUE)

If you have not already installed rmarkdown and prettydoc in your local system, before running the above install commands firstly install these packages as following:

```r install.packages('prettydoc')

If you have already installed '`fcvalid`', you can load it into R working environment by using the following command:

```r
library(fcvalid)

Load the data set

We demonstrate clusterig validation using the package 'fcvalid' on Iris data set (Anderson, 1935), a well-known real data set consisting of four features (Sepal.Length, Sepal.Width, Petal.Length and Petal.Width) with natural classes of three Iris species on the last column of data set. This four-dimensional data set contains totally 150 data objects, 50 in each class.

data(iris)
head(iris)
tail(iris)
x <- iris[,-5]

The following R command displays the scatter plots between the feature pairs.

pairs(x, col=iris[,5])

Partitioning Cluster Analysis

In this vignette we demonstrate the use of internal indexes with two partitioning clustering algorithms in the package 'ppclust' (Cebeci et al, 2017). Thus, we first load this package into R working environment.

library(ppclust)

Fuzzy C-Means Clustering

As the representative of the fuzzy clustering algorithms for demonstrating cluster validation we'll use the basic Fuzzy C-Means (FCM) clustering algorithm proposed by Bezdek (1974). We run fcm function of the ppclust package for three clusters as follows:

cres <- fcm(x, centers=3, m=2)

In order to summarize and plot the clustering results, the functions 'summary' and 'plotcluster' of the package 'ppclust' can be used as follows:

summary(cres)
plotcluster(cres, trans=TRUE)

Validition of Clustering

In order to validate the clustering result obtained in the previous subsection by using FCM, one of the cluster validity indexes of the package 'fcvalid. The following two commands demonstrate the use of the Partition Coeffient and Xie-Beni indexes.

pc(cres)
xb(cres)

Normally, however the users may evaluate the clustering quality with a couple of the internal indexes individually as shown above, the validity measures of all of the indexes may also be obtained alltogether. For this purpose, the function allindexes of the package 'fcvalid' is used as follows:

allindexes(cres)

Possibilistic Fuzzy C-Means Clustering

In this subsection, Possibilistic Fuzzy C-Means (PFCM) (Pal et al, 2005) is run to demonstrate the extended and generalized versions of the internal indexes.

cres2 <- pfcm(x=x, centers=3, m=2, eta=2)

In order to summarize and plot the clustering results, the functions 'summary' and 'plotcluster' of the package 'ppclust' can be used as follows:

summary(cres2)
plotcluster(cres2, trans=TRUE)

PFCM algorithm produces both the fuzzy and possibilistic partitions of data sets. For this reason the internal fuzzy validation indexes cannot be directly applied to evaluate clustering results of PFCM as well as other fuzzy and possibilistic clustering algorithms. In such cases, the generalized and extended versions of the fuzzy validity indexes (Cebeci et al, 2017) can be used for validation of the possibilistic clustering results. In the following example, the generalized index values of a PFCM run are shown for the indexes of Tang-Sun & Sun (TSS) and Fukuyama-Sugeno (FS).

fs(cres2, tidx="g")
tss(cres2, tidx="g")

Alternatively, the extended index values can be computed to evaluate the fuzzy and possibilistic clustering results. An extended index value is obtained by summation of fuzzy and possibilistic membership degrees for the algorithms producing both types of partitions. The example below demonstrates how to handle the extended values for XB and PBM indexes:

xb(cres2, tidx="e")
pbm(cres2, tidx="e")

In order to compute the generalized index values from all internal indexes, the function allindexes can be run as follows:

allindexes(cres2, tidx="g")

Finding Optimal Clustering

In order to decide to the best clustering result or to find an optimal value of number of clusters in data sets, cluster analysis should be repeated for a range of number of clusters. In the code chunk below, FCM algorithm is run for five levels of k (from 2 to 6). The lower (k1) and upper (k2) values can be changed for testing the different ranges of k.

k1 <- 2
k2 <- 6
indnames <- c("pc","mpc","pe","xb","kwon", "tss", "cl", "fs", 
              "pbm","si","sc","awcd", "fhv", "apd", "cs")
indvals <- matrix(ncol=length(indnames), nrow=k2-k1+1)
i <- 1
for(k in k1:k2){
  cres <- fcm(x=x, centers=k, nstart=3)
  indvals[i,1] <- pc(cres)
  indvals[i,2] <- mpc(cres)
  indvals[i,3] <- pe(cres)
  indvals[i,4] <- xb(cres)
  indvals[i,5] <- kwon(cres)
  indvals[i,6] <- tss(cres)
  indvals[i,7] <- cl(cres)
  indvals[i,8] <- fs(cres)
  indvals[i,9] <- pbm(cres)
  indvals[i,10] <- si(cres)$sif
  indvals[i,11] <- sc(cres)
  indvals[i,12] <- awcd(cres)
  indvals[i,13] <- fhv(cres)
  indvals[i,14] <- apd(cres)
  indvals[i,15] <- cs(cres)
  i <- i+1
}
colnames(indvals) <- indnames
rownames(indvals) <- paste0("k=",k1:k2) 
print(indvals)

The computed index values are visually inspected by using barplots, plots or other kind of graphical tools.

par(mfrow=c(5,3))
for(i in 1:length(indnames))
  barplot(indvals[,i], col="dodgerblue", main=indnames[i])
par(mfrow=c(5,3))
for(i in 1:length(indnames)){
  plot(0,0, type = "n",  
    cex.lab=0.8, cex.axis=0.8, cex.main=0.8, cex.sub=0.8, 
    xlim = c(1, nrow(indvals)), ylim = c(min(indvals[,i]),max(indvals[,i])), 
    xaxt='n', xlab="number of clusters", ylab="index value",
    main=indnames[i], sub=" ")
  axis(side=1, at=seq(1, nrow(indvals), by=1), 
    labels=paste0("k=",k1:k2), col.axis="black", las=1)
  lines(indvals[,i], type="b", col="blue", lty=1, lwd=1)
}

References

Anderson, E. (1935). The irises of the GASPE peninsula, in Bull. Amer. Iris Soc., 59: 2-5.

Bezdek, J.C. (1974). Cluster validity with fuzzy sets. J Cybernetics, 3(3):58-72. https://doi.org/10.1080/01969727308546047

Pal, N. R., Pal, K. & Bezdek, J. C. (2005). A possibilistic fuzzy c-means clustering algorithm. IEEE Trans. Fuzzy Systems, 13 (4): 517-530. https://doi.org/10.1109/TFUZZ.2004.840099

Cebeci, Z., Kavlak, A. T., & Yildiz, F. (2017). Validation of fuzzy and possibilistic clustering results. In 2017 IEEE International Artificial Intelligence and Data Processing Symposium (IDAP). pp. 1-7. https://doi.org/10.1109/IDAP.2017.8090183

Cebeci, Z., Yildiz, F., Kavlak, A.T., Cebeci, C. & Onder, H. (2017). ppclust: Probabilistic and Possibilistic Cluster Analysis. R package version 0.1.0, URL https://CRAN.R-project.org/package=ppclust



zcebeci/fcvalid documentation built on Oct. 4, 2022, 9:01 p.m.