ekm: K-Means Clustering Using Different Seeding Techniques

ekmR Documentation

K-Means Clustering Using Different Seeding Techniques

Description

The function ekm partitions a numeric data set by using the K-means clustering algorithm. It is a wrapper function of the standard kmeans function with more initialization (seeding) techniques and output values obtained in the multiple starts of the algorithm.

Usage

ekm(x, centers, dmetric="euclidean", alginitv="hartiganwong",  
    nstart=1, iter.max=1000, stand=FALSE, numseed)

Arguments

x

a numeric vector, data frame or matrix.

centers

an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers.

dmetric

a string for the distance calculation method. The default is euclidean for the Euclidean distances.

alginitv

a string for the initialization of cluster prototypes matrix. The default is hartiganwong for Hartigan-Wong seeding method. See get.algorithms for the alternative options.

nstart

an integer for the number of starts for clustering. The default is 1.

iter.max

an integer for the maximum number of iterations allowed. The default is 1000.

stand

a logical flag to standardize data. Its default value is FALSE. If its value is TRUE, the data matrix x is standardized.

numseed

a seeding number to set the seed of R's random number generator.

Details

K-Means (KM) clustering algorithm partitions a data set of n objects into k, a pre-defined number of clusters. It is an iterative relocation algorithm, and in each iteration step the objects are assigned to the nearest cluster by using the Euclidean distances. The objective of KM is to minimize total intra-cluster variance (or the sum of squared errors):

J_{KM}(\mathbf{X}; \mathbf{V})=\sum\limits_{i=1}^n d^2(\vec{x}_i, \vec{v}_j)

In the above equation for J_{KM}:

d^2(\vec{x}_i, \vec{v}_j) is the distance measure between the object \vec{x}_j and cluster prototype \vec{v}_i.The Euclidean distance metric is usually employed with the implementations of K-means.

See kmeans and ppclust-package for more details about the terms of the objective function J_{KM}.

The update equation of the cluster prototypes:

\vec{v}_{j} =\frac{1}{n_j} \sum\limits_{i=1}^{n_j} x_{ij} \;;\; 1 \leq j \leq k

Value

an object of class ‘ppclust’, which is a list consists of the following items:

x

a numeric matrix containing the processed data set.

v

a numeric matrix containing the final cluster prototypes (centers of clusters).

u

a numeric matrix containing the crisp membership degrees of the data objects.

d

a numeric matrix containing the distances of objects to the final cluster prototypes.

k

an integer for the number of clusters.

cluster

a numeric vector containing the cluster labels found by defuzzying the fuzzy membership degrees of the objects.

csize

a numeric vector containing the number of objects in the clusters.

iter

an integer vector for the number of iterations in each start of the algorithm.

best.start

an integer for the index of start with the minimum objective functional.

func.val

a numeric vector for the objective function values in each start of the algorithm.

comp.time

a numeric vector for the execution time in each start of the algorithm.

stand

a logical value, TRUE shows that x data set contains the standardized values of raw data.

wss

a number for the within-cluster sum of squares for each cluster.

bwss

a number for the between-cluster sum of squares.

tss

a number for the total within-cluster sum of squares.

twss

a number for the total sum of squares.

algorithm

a string for the name of partitioning algorithm. It is ‘KM’ with this function.

call

a string for the matched function call generating this ‘ppclust’ object.

Author(s)

Zeynel Cebeci, Figen Yildiz & Hasan Onder

References

MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Proc. of 5th Berkeley Symp. on Mathematical Statistics and Probability, Berkeley, University of California Press, 1: 281-297. <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.308.8619&rep=rep1&type=pdf>

See Also

kmeans, fcm, fcm2, fpcm, fpppcm, gg, gk, gkpfcm, hcm, pca, pcm, pcmr, pfcm, upfc

Examples

# Load dataset iris 
data(iris)
x <- iris[,-5]

# Run EKM for 3 clusters
res.ekm <- ekm(x, centers=3)

# Print and plot the clustering result
print(res.ekm$cluster)
plot(x, col=res.ekm$cluster, pch=16)

ppclust documentation built on May 29, 2024, 7:20 a.m.