ekm: K-Means Clustering Using Different Seeding Techniques
In ppclust: Probabilistic and Possibilistic Cluster Analysis

ekm	R Documentation

K-Means Clustering Using Different Seeding Techniques

Description

The function ekm partitions a numeric data set by using the K-means clustering algorithm. It is a wrapper function of the standard kmeans function with more initialization (seeding) techniques and output values obtained in the multiple starts of the algorithm.

Usage

ekm(x, centers, dmetric="euclidean", alginitv="hartiganwong",  
    nstart=1, iter.max=1000, stand=FALSE, numseed)

Arguments

`x`	a numeric vector, data frame or matrix.
`centers`	an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers.
`dmetric`	a string for the distance calculation method. The default is euclidean for the Euclidean distances.
`alginitv`	a string for the initialization of cluster prototypes matrix. The default is hartiganwong for Hartigan-Wong seeding method. See `get.algorithms` for the alternative options.
`nstart`	an integer for the number of starts for clustering. The default is 1.
`iter.max`	an integer for the maximum number of iterations allowed. The default is 1000.
`stand`	a logical flag to standardize data. Its default value is `FALSE`. If its value is `TRUE`, the data matrix `x` is standardized.
`numseed`	a seeding number to set the seed of R's random number generator.

Details

K-Means (KM) clustering algorithm partitions a data set of n objects into k, a pre-defined number of clusters. It is an iterative relocation algorithm, and in each iteration step the objects are assigned to the nearest cluster by using the Euclidean distances. The objective of KM is to minimize total intra-cluster variance (or the sum of squared errors):

J_{KM}(\mathbf{X}; \mathbf{V})=\sum\limits_{i=1}^n d^2(\vec{x}_i, \vec{v}_j)

In the above equation for J_{KM}:

d^2(\vec{x}_i, \vec{v}_j) is the distance measure between the object \vec{x}_j and cluster prototype \vec{v}_i.The Euclidean distance metric is usually employed with the implementations of K-means.

See kmeans and ppclust-package for more details about the terms of the objective function J_{KM}.

The update equation of the cluster prototypes:

\vec{v}_{j} =\frac{1}{n_j} \sum\limits_{i=1}^{n_j} x_{ij} \;;\; 1 \leq j \leq k

Value

an object of class ‘ppclust’, which is a list consists of the following items:

`x`	a numeric matrix containing the processed data set.
`v`	a numeric matrix containing the final cluster prototypes (centers of clusters).
`u`	a numeric matrix containing the crisp membership degrees of the data objects.
`d`	a numeric matrix containing the distances of objects to the final cluster prototypes.
`k`	an integer for the number of clusters.
`cluster`	a numeric vector containing the cluster labels found by defuzzying the fuzzy membership degrees of the objects.
`csize`	a numeric vector containing the number of objects in the clusters.
`iter`	an integer vector for the number of iterations in each start of the algorithm.
`best.start`	an integer for the index of start with the minimum objective functional.
`func.val`	a numeric vector for the objective function values in each start of the algorithm.
`comp.time`	a numeric vector for the execution time in each start of the algorithm.
`stand`	a logical value, `TRUE` shows that `x` data set contains the standardized values of raw data.
`wss`	a number for the within-cluster sum of squares for each cluster.
`bwss`	a number for the between-cluster sum of squares.
`tss`	a number for the total within-cluster sum of squares.
`twss`	a number for the total sum of squares.
`algorithm`	a string for the name of partitioning algorithm. It is ‘KM’ with this function.
`call`	a string for the matched function call generating this ‘ppclust’ object.

Author(s)

Zeynel Cebeci, Figen Yildiz & Hasan Onder

References

MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Proc. of 5th Berkeley Symp. on Mathematical Statistics and Probability, Berkeley, University of California Press, 1: 281-297. <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.308.8619&rep=rep1&type=pdf>

Examples

# Load dataset iris 
data(iris)
x <- iris[,-5]

# Run EKM for 3 clusters
res.ekm <- ekm(x, centers=3)

# Print and plot the clustering result
print(res.ekm$cluster)
plot(x, col=res.ekm$cluster, pch=16)

ppclust documentation built on May 29, 2024, 7:20 a.m.