insdev: Initialization of cluster prototypes using Insdev algorithm

View source: R/inaparc.R

insdevR Documentation

Initialization of cluster prototypes using Insdev algorithm

Description

Insdev is a novel algorithm that initializes the cluster prototypes by using the standard deviation of a selected feature. The selected feature is the most important feature in regard of variation. For this purpose the coefficients of variation of the features are compared, and then the feature with highest coefficient of variation is selected for further processes.

Usage

insdev(x, k, sfidx)

Arguments

x

a numeric vector, data frame or matrix.

k

an integer specifying the number of clusters.

sfidx

an integer specifying the column index of the selected feature. Here, in this function we use the feature with high variability as the selected feature because it dominates the clustering results (Khan, 2912). If missing, so it is internally determined by comparing the coefficents of variation for all the features in the data set. The feature having the maximum coefficient of variation is used as the selected feature.

Details

At first the algorithm computes the mean of the selected feature (\bar{x_{s}}) and then seeks the object whose distance is minimum to \bar{x_{s}} as the prototype of first cluster. The prototypes of remaining clusters are determined by using a stepping range (R), computed from the standard deviation of selected feature with the formula R=1/2σ_{x_{s}}/k. The prototype of second cluster is the object whose distance is minimum to \bar{x_{s}} + (i-1) R, where i is the cluster index. The prototype of third cluster is the object whose distance is minimum to \bar{x_{s}} - i R in the opposite direction to previous prototype. The prototypes remaining clusters are cyclically determined in similar way.

Since it produces the same prototypes in each run of it, insdev is a deterministic algorithm. Therefore, this characteristic of the algorithm provides replicability in initialization procedure.

Value

an object of class ‘inaparc’, which is a list consists of the following items:

v

a numeric matrix containing the initial cluster prototypes.

sfidx

an integer for the column index of the selected feature, used in the calculations.

ctype

a string representing the type of centroid, which used to build prototype matrix. Its value is ‘obj’ with this function because the cluster prototypes are the objects sampled from the data set.

call

a string containing the matched function call that generates this ‘inaparc’ object.

Author(s)

Zeynel Cebeci, Cagatay Cebeci

References

Khan, F. (2012). An initial seed selection algorithm for k-means clustering of georeferenced data to improve replicability of cluster assignments for mapping application. Applied Soft Computing, 12 (11) : 3698-3700. doi: 10.1016/j.asoc.2012.07.021

See Also

aldaoud, ballhall, crsamp, firstk, forgy, hartiganwong, inofrep, inscsf, kkz, kmpp, ksegments, ksteps, lastk, lhsmaximin, lhsrandom, maximin, mscseek, rsamp, rsegment, scseek, scseek2, ssamp, topbottom, uniquek, ursamp

Examples

data(iris)
res <- insdev(x=iris[,1:4], k=5)
v <- res$v
print(v)

inaparc documentation built on June 16, 2022, 5:09 p.m.