ekm | R Documentation |
The function ekm
partitions a numeric data set by using the K-means clustering algorithm. It is a wrapper function of the standard kmeans
function with more initialization (seeding) techniques and output values obtained in the multiple starts of the algorithm.
ekm(x, centers, dmetric="euclidean", alginitv="hartiganwong",
nstart=1, iter.max=1000, stand=FALSE, numseed)
x |
a numeric vector, data frame or matrix. |
centers |
an integer specifying the number of clusters or a numeric matrix containing the initial cluster centers. |
dmetric |
a string for the distance calculation method. The default is euclidean for the Euclidean distances. |
alginitv |
a string for the initialization of cluster prototypes matrix. The default is hartiganwong for Hartigan-Wong seeding method. See |
nstart |
an integer for the number of starts for clustering. The default is 1. |
iter.max |
an integer for the maximum number of iterations allowed. The default is 1000. |
stand |
a logical flag to standardize data. Its default value is |
numseed |
a seeding number to set the seed of R's random number generator. |
K-Means (KM) clustering algorithm partitions a data set of n
objects into k
, a pre-defined number of clusters. It is an iterative relocation algorithm, and in each iteration step the objects are assigned to the nearest cluster by using the Euclidean distances. The objective of KM is to minimize total intra-cluster variance (or the sum of squared errors):
J_{KM}(\mathbf{X}; \mathbf{V})=\sum\limits_{i=1}^n d^2(\vec{x}_i, \vec{v}_j)
In the above equation for J_{KM}
:
d^2(\vec{x}_i, \vec{v}_j)
is the distance measure between the object \vec{x}_j
and cluster prototype \vec{v}_i
.The Euclidean distance metric is usually employed with the implementations of K-means.
See kmeans
and ppclust-package
for more details about the terms of the objective function J_{KM}
.
The update equation of the cluster prototypes:
\vec{v}_{j} =\frac{1}{n_j} \sum\limits_{i=1}^{n_j} x_{ij} \;;\; 1 \leq j \leq k
an object of class ‘ppclust’, which is a list consists of the following items:
x |
a numeric matrix containing the processed data set. |
v |
a numeric matrix containing the final cluster prototypes (centers of clusters). |
u |
a numeric matrix containing the crisp membership degrees of the data objects. |
d |
a numeric matrix containing the distances of objects to the final cluster prototypes. |
k |
an integer for the number of clusters. |
cluster |
a numeric vector containing the cluster labels found by defuzzying the fuzzy membership degrees of the objects. |
csize |
a numeric vector containing the number of objects in the clusters. |
iter |
an integer vector for the number of iterations in each start of the algorithm. |
best.start |
an integer for the index of start with the minimum objective functional. |
func.val |
a numeric vector for the objective function values in each start of the algorithm. |
comp.time |
a numeric vector for the execution time in each start of the algorithm. |
stand |
a logical value, |
wss |
a number for the within-cluster sum of squares for each cluster. |
bwss |
a number for the between-cluster sum of squares. |
tss |
a number for the total within-cluster sum of squares. |
twss |
a number for the total sum of squares. |
algorithm |
a string for the name of partitioning algorithm. It is ‘KM’ with this function. |
call |
a string for the matched function call generating this ‘ppclust’ object. |
Zeynel Cebeci, Figen Yildiz & Hasan Onder
MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Proc. of 5th Berkeley Symp. on Mathematical Statistics and Probability, Berkeley, University of California Press, 1: 281-297. <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.308.8619&rep=rep1&type=pdf>
kmeans
,
fcm
,
fcm2
,
fpcm
,
fpppcm
,
gg
,
gk
,
gkpfcm
,
hcm
,
pca
,
pcm
,
pcmr
,
pfcm
,
upfc
# Load dataset iris
data(iris)
x <- iris[,-5]
# Run EKM for 3 clusters
res.ekm <- ekm(x, centers=3)
# Print and plot the clustering result
print(res.ekm$cluster)
plot(x, col=res.ekm$cluster, pch=16)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.