kpeaks-package: Determination of K Using Peak Counts of Features for...
In kpeaks: Determination of K Using Peak Counts of Features for Clustering

Description Details Author(s) References See Also

The input argument k, represents the number of clusters is needed to start all the partitioning clustering algorithms. In unsupervised learning applications, an optimal value of this argument is widely determined by using the internal validity indexes. Since these indexes suggest a k value which is computed on the clustering results obtained with several runs of a clustering algorithm, they are computationally expensive. On the contrary, the package 'kpeaks' enables to estimate k before running any clustering algorithm. It is based on a simple novel technique using the descriptive statistics of peak counts of the features in a dataset.

The package 'kpeaks' contains five functions and one synthetically created dataset for testing purposes. In order to suggest an estimate of k, the function findk internally calls the functions genpolygon and findpolypeaks, respectively. The frequency polygons can be visually inspected by using the function plotpolygon. Using the function rmshoulders is recommended to flatten or remove the the shoulder peaks around the main peaks of a frequency polygon, if any.

Zeynel Cebeci, Cagatay Cebeci

Cebeci, Z. & Cebeci, C. (2018). "A novel technique for fast determination of K in partitioning cluster analysis", Journal of Agricultural Informatics, 9(2), 1-11. doi: 10.17700/jai.2018.9.2.442.

Cebeci, Z. & Cebeci, C. (2018). "kpeaks: An R Package for Quick Selection of K for Cluster Analysis", In 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), IEEE. doi: 10.1109/IDAP.2018.8620896.

findk, findpolypeaks, genpolygon, plotpolygon, rmshoulders