To cite this vignette and package 'fcvalid', please use one of the following items that fits to your references list:
Cebeci, Z., Kavlak, A.T., & Yildiz, F. (2017). "Validation of fuzzy and possibilistic clustering results", in Proc. of 2017 IEEE International Artificial Intelligence and Data Processing Symposium (IDAP) pp. 1-7. doi: 10.1109/IDAP.2017.8090183.
or in BibTeX format:
@inproceedings{cebeci2017validation, title={Validation of fuzzy and possibilistic clustering results}, author={Cebeci, Zeynel and Kavlak, Alper Tuna and Yildiz, Figen}, booktitle={2017 International Artificial Intelligence and Data Processing Symposium (IDAP)}, pages={1--7} year={2017}, organization={IEEE}, url={https://10.1109/IDAP.2017.8090183}, }
In order to install the package fcvalid
from the GitHub repository you should first install the devtools
package from CRAN into your local system. Then you can install the package fcvalid
using install_github
of devtools
package as shown with the R code chunks below:
if(!require(devtools)) {install.packages('devtools'); library(devtools)} install_github("zcebeci/fcvalid")
If you would like to have a compiled version of the vignettes of the package try to install the package fcvalid
using install_github
with build_vignettes
argument set to TRUE as shown below:
if(!require(devtools)) {install.packages('devtools'); library(devtools)} devtools::install_github("zcebeci/fcvalid", build_vignettes=TRUE)
If you have not already installed rmarkdown
and prettydoc
in your local system, before running the above install commands firstly install these packages as following:
```r install.packages('prettydoc')
If you have already installed '`fcvalid`', you can load it into R working environment by using the following command: ```r library(fcvalid)
We demonstrate clusterig validation using the package 'fcvalid
' on Iris data set (Anderson, 1935), a well-known real data set consisting of four features (Sepal.Length
, Sepal.Width
, Petal.Length
and Petal.Width
) with natural classes of three Iris species on the last column of data set. This four-dimensional data set contains totally 150 data objects, 50 in each class.
data(iris) head(iris) tail(iris) x <- iris[,-5]
The following R command displays the scatter plots between the feature pairs.
pairs(x, col=iris[,5])
In this vignette we demonstrate the use of internal indexes with two partitioning clustering algorithms in the package 'ppclust' (Cebeci et al, 2017). Thus, we first load this package into R working environment.
library(ppclust)
As the representative of the fuzzy clustering algorithms for demonstrating cluster validation we'll use the basic Fuzzy C-Means (FCM) clustering algorithm proposed by Bezdek (1974). We run fcm
function of the ppclust
package for three clusters as follows:
cres <- fcm(x, centers=3, m=2)
In order to summarize and plot the clustering results, the functions 'summary
' and 'plotcluster
' of the package 'ppclust
' can be used as follows:
summary(cres) plotcluster(cres, trans=TRUE)
In order to validate the clustering result obtained in the previous subsection by using FCM, one of the cluster validity indexes of the package 'fcvalid
. The following two commands demonstrate the use of the Partition Coeffient and Xie-Beni indexes.
pc(cres) xb(cres)
Normally, however the users may evaluate the clustering quality with a couple of the internal indexes individually as shown above, the validity measures of all of the indexes may also be obtained alltogether. For this purpose, the function allindexes
of the package 'fcvalid
' is used as follows:
allindexes(cres)
In this subsection, Possibilistic Fuzzy C-Means (PFCM) (Pal et al, 2005) is run to demonstrate the extended and generalized versions of the internal indexes.
cres2 <- pfcm(x=x, centers=3, m=2, eta=2)
In order to summarize and plot the clustering results, the functions 'summary
' and 'plotcluster
' of the package 'ppclust
' can be used as follows:
summary(cres2) plotcluster(cres2, trans=TRUE)
PFCM algorithm produces both the fuzzy and possibilistic partitions of data sets. For this reason the internal fuzzy validation indexes cannot be directly applied to evaluate clustering results of PFCM as well as other fuzzy and possibilistic clustering algorithms. In such cases, the generalized and extended versions of the fuzzy validity indexes (Cebeci et al, 2017) can be used for validation of the possibilistic clustering results. In the following example, the generalized index values of a PFCM run are shown for the indexes of Tang-Sun & Sun (TSS) and Fukuyama-Sugeno (FS).
fs(cres2, tidx="g") tss(cres2, tidx="g")
Alternatively, the extended index values can be computed to evaluate the fuzzy and possibilistic clustering results. An extended index value is obtained by summation of fuzzy and possibilistic membership degrees for the algorithms producing both types of partitions. The example below demonstrates how to handle the extended values for XB and PBM indexes:
xb(cres2, tidx="e") pbm(cres2, tidx="e")
In order to compute the generalized index values from all internal indexes, the function allindexes
can be run as follows:
allindexes(cres2, tidx="g")
In order to decide to the best clustering result or to find an optimal value of number of clusters in data sets, cluster analysis should be repeated for a range of number of clusters. In the code chunk below, FCM algorithm is run for five levels of k
(from 2 to 6). The lower (k1
) and upper (k2
) values can be changed for testing the different ranges of k
.
k1 <- 2 k2 <- 6 indnames <- c("pc","mpc","pe","xb","kwon", "tss", "cl", "fs", "pbm","si","sc","awcd", "fhv", "apd", "cs") indvals <- matrix(ncol=length(indnames), nrow=k2-k1+1) i <- 1 for(k in k1:k2){ cres <- fcm(x=x, centers=k, nstart=3) indvals[i,1] <- pc(cres) indvals[i,2] <- mpc(cres) indvals[i,3] <- pe(cres) indvals[i,4] <- xb(cres) indvals[i,5] <- kwon(cres) indvals[i,6] <- tss(cres) indvals[i,7] <- cl(cres) indvals[i,8] <- fs(cres) indvals[i,9] <- pbm(cres) indvals[i,10] <- si(cres)$sif indvals[i,11] <- sc(cres) indvals[i,12] <- awcd(cres) indvals[i,13] <- fhv(cres) indvals[i,14] <- apd(cres) indvals[i,15] <- cs(cres) i <- i+1 } colnames(indvals) <- indnames rownames(indvals) <- paste0("k=",k1:k2) print(indvals)
The computed index values are visually inspected by using barplots, plots or other kind of graphical tools.
par(mfrow=c(5,3)) for(i in 1:length(indnames)) barplot(indvals[,i], col="dodgerblue", main=indnames[i])
par(mfrow=c(5,3)) for(i in 1:length(indnames)){ plot(0,0, type = "n", cex.lab=0.8, cex.axis=0.8, cex.main=0.8, cex.sub=0.8, xlim = c(1, nrow(indvals)), ylim = c(min(indvals[,i]),max(indvals[,i])), xaxt='n', xlab="number of clusters", ylab="index value", main=indnames[i], sub=" ") axis(side=1, at=seq(1, nrow(indvals), by=1), labels=paste0("k=",k1:k2), col.axis="black", las=1) lines(indvals[,i], type="b", col="blue", lty=1, lwd=1) }
Anderson, E. (1935). The irises of the GASPE peninsula, in Bull. Amer. Iris Soc., 59: 2-5.
Bezdek, J.C. (1974). Cluster validity with fuzzy sets. J Cybernetics, 3(3):58-72. https://doi.org/10.1080/01969727308546047
Pal, N. R., Pal, K. & Bezdek, J. C. (2005). A possibilistic fuzzy c-means clustering algorithm. IEEE Trans. Fuzzy Systems, 13 (4): 517-530. https://doi.org/10.1109/TFUZZ.2004.840099
Cebeci, Z., Kavlak, A. T., & Yildiz, F. (2017). Validation of fuzzy and possibilistic clustering results. In 2017 IEEE International Artificial Intelligence and Data Processing Symposium (IDAP). pp. 1-7. https://doi.org/10.1109/IDAP.2017.8090183
Cebeci, Z., Yildiz, F., Kavlak, A.T., Cebeci, C. & Onder, H. (2017). ppclust: Probabilistic and Possibilistic Cluster Analysis. R package version 0.1.0, URL https://CRAN.R-project.org/package=ppclust
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.