insdev | R Documentation |
Insdev
is a novel algorithm that initializes the cluster prototypes by using the standard deviation of a selected feature. The selected feature is the most important feature in regard of variation. For this purpose the coefficients of variation of the features are compared, and then the feature with highest coefficient of variation is selected for further processes.
insdev(x, k, sfidx)
x |
a numeric vector, data frame or matrix. |
k |
an integer specifying the number of clusters. |
sfidx |
an integer specifying the column index of the selected feature. Here, in this function we use the feature with high variability as the selected feature because it dominates the clustering results (Khan, 2912). If missing, so it is internally determined by comparing the coefficents of variation for all the features in the data set. The feature having the maximum coefficient of variation is used as the selected feature. |
At first the algorithm computes the mean of the selected feature (\bar{x_{s}}) and then seeks the object whose distance is minimum to \bar{x_{s}} as the prototype of first cluster. The prototypes of remaining clusters are determined by using a stepping range (R), computed from the standard deviation of selected feature with the formula R=1/2σ_{x_{s}}/k. The prototype of second cluster is the object whose distance is minimum to \bar{x_{s}} + (i-1) R, where i is the cluster index. The prototype of third cluster is the object whose distance is minimum to \bar{x_{s}} - i R in the opposite direction to previous prototype. The prototypes remaining clusters are cyclically determined in similar way.
Since it produces the same prototypes in each run of it, insdev
is a deterministic algorithm. Therefore, this characteristic of the algorithm provides replicability in initialization procedure.
an object of class ‘inaparc’, which is a list consists of the following items:
v |
a numeric matrix containing the initial cluster prototypes. |
sfidx |
an integer for the column index of the selected feature, used in the calculations. |
ctype |
a string representing the type of centroid, which used to build prototype matrix. Its value is ‘obj’ with this function because the cluster prototypes are the objects sampled from the data set. |
call |
a string containing the matched function call that generates this ‘inaparc’ object. |
Zeynel Cebeci, Cagatay Cebeci
Khan, F. (2012). An initial seed selection algorithm for k-means clustering of georeferenced data to improve replicability of cluster assignments for mapping application. Applied Soft Computing, 12 (11) : 3698-3700. doi: 10.1016/j.asoc.2012.07.021
aldaoud
,
ballhall
,
crsamp
,
firstk
,
forgy
,
hartiganwong
,
inofrep
,
inscsf
,
kkz
,
kmpp
,
ksegments
,
ksteps
,
lastk
,
lhsmaximin
,
lhsrandom
,
maximin
,
mscseek
,
rsamp
,
rsegment
,
scseek
,
scseek2
,
ssamp
,
topbottom
,
uniquek
,
ursamp
data(iris) res <- insdev(x=iris[,1:4], k=5) v <- res$v print(v)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.