maximin: Initialization of cluster prototypes using Maximin algorithm

View source: R/inaparc.R

maximinR Documentation

Initialization of cluster prototypes using Maximin algorithm

Description

Initializes the cluster prototypes matrix by using the Maximin algorithm.

Usage

maximin(x, k)

Arguments

x

a numeric vector, data frame or matrix.

k

an integer for the number of clusters.

Details

The main idea of the Maximin algorithm is to isolate the cluster prototypes that are farthest apart (Philpot, 2001). The algorithm randomly samples one data object from the data set and assigns it as the first cluster prototype. The prototype of second cluster is determined as the data object which is farthest from the first prototype. Then, the remaining part of data set is scanned for the data objects whose distances are minimum to the previously selected prototypes. The object having the maximum of minimum distances is assigned the prototype of third cluster. The same procedure is repeated for determining the prototypes of other clusters (Spaeth, 1997; Gonzales, 1985; Duda et al, 2000, Celebi et al, 2013).

The algorithm generally works well with circular shaped clusters whose radius are smaller than the separation between clusters. However, it is very sensitive to the order of object in data sets. Also it is computationally expensive because each time once a new cluster prototype is selected, the distances must be computed for every object from every cluster prototype (Philpot, 2001). In order to contribute to the solutions of this problem, the current implementation of maximin includes a simple control that if an object has the minimum distance of zero, the seeking procedure is no more continued to compute the distances for the remaining objects. This control may speed the algorithm up with the maximin function in this package.

Value

an object of class ‘inaparc’, which is a list consists of the following items:

v

a numeric matrix of the initial cluster prototypes.

ctype

a string representing the type of centroid, which used to build prototype matrix. Its value is ‘obj’ with this function because the cluster prototype matrix contains the objects.

call

a string containing the matched function call that generates the object ‘inaparc’.

Author(s)

Zeynel Cebeci, Cagatay Cebeci

References

Spaeth, H. (1977). Computational experiences with the exchange method: Applied to four commonly used partitioning cluster analysis criteria, European Journal of Operational Research 1 (1): 23-31. doi: 10.1016/S0377-2217(77)81005-9

Gonzalez, T. (1985), Clustering to minimize the maximum intercluster distance, Theoretical Computer Science 38 (2-3): 293-306. url:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.366.8183&rep=rep1&type=pdf

Duda, R.O., Hart, P.E. & Stork, D.G. (2000). Pattern Classification, Wiley-Interscience. <ISBN:978-0-471-05669-0>

Celebi, M.E., Kingravi, H.A. & Vela, P.A. (2013). A comparative study of efficient initialization methods for the K-means clustering algorithm, Expert Systems with Applications, 40 (1): 200-210. arXiv:https://arxiv.org/pdf/1209.1960.pdf

Philpot, W. (2001). Topic 8: Clustering/Unsupervised Classification in Lecture Notes, CEE 615: Digital Image Processing - Jan 2001, Cornell Univ., url:https://www-users.cse.umn.edu/~kumar/dmbook/ch8.pdf

See Also

aldaoud, ballhall, crsamp, firstk, forgy, hartiganwong, inofrep, inscsf, insdev, kkz, kmpp, ksegments, ksteps, lastk, lhsmaximin, lhsrandom, mscseek, rsamp, rsegment, scseek, scseek2, spaeth, ssamp, topbottom, uniquek, ursamp

Examples

data(iris)
res <-maximin(x=iris[,1:4], k=5)
v <- res$v
print(v)

inaparc documentation built on June 16, 2022, 5:09 p.m.