ballhall: Initialization of cluster prototypes using Ball & Hall's...

View source: R/inaparc.R

ballhallR Documentation

Initialization of cluster prototypes using Ball & Hall's algorithm

Description

Initializes the prototypes of clusters by using the cluster seeding algorithm which has been proposed by Ball & Hall (1967).

Usage

ballhall(x, k, tv)

Arguments

x

a numeric vector, data frame or matrix.

k

an integer specifying the number of clusters.

tv

a number to be used as T, a threshold distance value. It is directly input by the user. Also it is possible to compute T with the following options of tv argument:

  • T is the mean of differences between the consecutive pairs of objects with the option cd1.

  • T is the minimum of differences between the consecutive pairs of objects with the option cd2.

  • T is the mean of Euclidean distances between the consecutive pairs of objects divided into k with the option md. This is the default if tv is not supplied by the user.

  • T is the range of maximum and minimum of Euclidean distances between the consecutive pairs of objects divided into k with the option mm.

Details

In the Ball and Hall's algorithm (Ball & Hall, 1967), the center of gravity of data is assigned as the prototype of first cluster. It then passes the data objects in arbitrary order and takes an object as the next prototype if it is T units far from the previously selected prototypes. The purpose of using T, the distance threshold, is to make the cluster protoypes at least T units away from each other. Ball & Hall's method may be sensitive to the order of data, and moreover, deciding for an appropriate value of T is is also difficult (Celebi et al, 2013). As the solutions to this problem, the function ballhall in this package computes a T value using some distance measures, if it is not specified by the user (for details, see the section ‘Arguments’ above.)

Value

an object of class ‘inaparc’, which is a list consists of the following items:

v

a numeric matrix containing the initial cluster prototypes.

ctype

a string for the type of used centroid. It is ‘obj’ with this function because the created cluster prototypes matrix contains the selected objects.

call

a string containing the matched function call that generates this ‘inaparc’ object.

Author(s)

Zeynel Cebeci, Cagatay Cebeci

References

Ball, G.H. & Hall, D.J. (1967). A clustering technique for summarizing multivariate data, Systems Res. & Behavioral Sci., 12 (2): 153-155.

Celebi, M.E., Kingravi, H.A. & Vela, P.A. (2013). A comparative study of efficient initialization methods for the K-means clustering algorithm, Expert Systems with Applications, 40 (1): 200-210. arXiv:https://arxiv.org/pdf/1209.1960.pdf

See Also

aldaoud, crsamp, firstk, hartiganwong, inofrep, inscsf, insdev, kkz, kmpp, ksegments, ksteps, lastk, lhsmaximin, lhsrandom, maximin, mscseek, rsamp, rsegment, scseek, scseek2, spaeth, ssamp, topbottom, uniquek, ursamp,

Examples

data(iris)
# Run with a user described threshold value
v1 <- ballhall(x=iris[,1:4], k=5, tv=0.6)$v
print(v1)

# Run with the internally computed default threshold value
v2 <- ballhall(x=iris[,1:4], k=5)$v
print(v2)

inaparc documentation built on June 16, 2022, 5:09 p.m.