FKM.gkb.ent.noise: Gustafson, Kessel and Babuska - like fuzzy k-means with...

FKM.gkb.ent.noiseR Documentation

Gustafson, Kessel and Babuska - like fuzzy k-means with entropy regularization and noise cluster

Description

Performs the Gustafson, Kessel and Babuska - like fuzzy k-means clustering algorithm with entropy regularization and noise cluster.
Differently from fuzzy k-means, it is able to discover non-spherical clusters.
The Babuska et al. variant improves the computation of the fuzzy covariance matrices in the standard Gustafson and Kessel clustering algorithm.
The entropy regularization allows us to avoid using the artificial fuzziness parameter m. This is replaced by the degree of fuzzy entropy ent, related to the concept of temperature in statistical physics. An interesting property of the fuzzy k-means with entropy regularization is that the prototypes are obtained as weighted means with weights equal to the membership degrees (rather than to the membership degrees at the power of m as is for the fuzzy k-means).
The noise cluster is an additional cluster (with respect to the k standard clusters) such that objects recognized to be outliers are assigned to it with high membership degrees.

Usage

 FKM.gkb.ent.noise (X,k,ent,vp,delta,gam,mcn,RS,stand,startU,index,alpha,conv,maxit,seed)

Arguments

X

Matrix or data.frame

k

An integer value or vector specifying the number of clusters for which the index is to be calculated (default: 2:6)

ent

Degree of fuzzy entropy (default: 1)

vp

Volume parameter (default: rep(1,max(k)). If k is a vector, for each group the first k element of vpare considered.

delta

Noise distance (default: average Euclidean distance between objects and prototypes from FKM.gk.ent using the same values of k and m)

gam

Weighting parameter for the fuzzy covariance matrices (default: 0)

mcn

Maximum condition number for the fuzzy covariance matrices (default: 1e+15)

RS

Number of (random) starts (default: 1)

stand

Standardization: if stand=1, the clustering algorithm is run using standardized data (default: no standardization)

startU

Rational start for the membership degree matrix U (default: no rational start)

index

Cluster validity index to select the number of clusters: "PC" (partition coefficient), "PE" (partition entropy), "MPC" (modified partition coefficient), "SIL" (silhouette), "SIL.F" (fuzzy silhouette), "XB" (Xie and Beni) (default: "SIL.F")

alpha

Weighting coefficient for the fuzzy silhouette index SIL.F (default: 1)

conv

Convergence criterion (default: 1e-9)

maxit

Maximum number of iterations (default: 1e+2)

seed

Seed value for random number generation (default: NULL)

Details

If startU is given, the argument k is ignored (the number of clusters is ncol(startU)).
If startU is given, the first element of value, cput and iter refer to the rational start.
If a cluster covariance matrix becomes singular, the algorithm stops and the element of value is NaN.
The default value for ent is in general not reasonable if FKM.gk.ent is run using raw data.
The update of the membership degrees requires the computation of exponential functions. In some cases, this may produce NaN values and the algorithm stops. Such a problem is usually solved by running FKM.gk.ent.noise using standardized data (stand=1).

Value

Object of class fclust, which is a list with the following components:

U

Membership degree matrix

H

Prototype matrix

F

Array containing the covariance matrices of all the clusters

clus

Matrix containing the indexes of the clusters where the objects are assigned (column 1) and the associated membership degrees (column 2)

medoid

Vector containing the indexes of the medoid objects (NULL for FKM.gkb.ent.noise)

value

Vector containing the loss function values for the RS starts

criterion

Vector containing the values of the cluster validity index

iter

Vector containing the numbers of iterations for the RS starts

k

Number of clusters

m

Parameter of fuzziness (NULL for FKM.gkb.ent.noise)

ent

Degree of fuzzy entropy

b

Parameter of the polynomial fuzzifier (NULL for FKM.gkb.ent.noise)

vp

Volume parameter

delta

Noise distance

gam

Weighting parameter for the fuzzy covariance matrices

mcn

Maximum condition number for the fuzzy covariance matrices

stand

Standardization (Yes if stand=1, No if stand=0)

Xca

Data used in the clustering algorithm (standardized data if stand=1)

X

Raw data

D

Dissimilarity matrix (NULL for FKM.gkb.ent.noise)

call

Matched call

Author(s)

Paolo Giordani, Maria Brigida Ferraro, Alessio Serafini

References

Babuska R., van der Veen P.J., Kaymak U., 2002. Improved covariance estimation for Gustafson-Kessel clustering. Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 1081-1085.
Dave' R.N., 1991. Characterization and detection of noise in clustering. Pattern Recognition Letters, 12, 657-664.
Ferraro M.B., Giordani P., 2013. A new fuzzy clustering algorithm with entropy regularization. Proceedings of the meeting on Classification and Data Analysis (CLADAG).

See Also

FKM.gk.ent.noise, Fclust, Fclust.index, print.fclust, summary.fclust, plot.fclust, unemployment

Examples

## Not run: 
## unemployment data
data(unemployment)
## Gustafson, Kessel and Babuska-like fuzzy k-means with entropy regularization and noise cluster,
##fixing the number of clusters
clust=FKM.gkb.ent.noise(unemployment,k=3,ent=0.2,delta=1,RS=10,stand=1)
## Gustafson, Kessel and Babuska-like fuzzy k-means with entropy regularization and noise cluster,
##selecting the number of clusters
clust=FKM.gkb.ent.noise(unemployment,k=2:6,ent=0.2,delta=1,RS=10,stand=1)

## End(Not run)

fclust documentation built on Nov. 16, 2022, 5:10 p.m.