kluster: a function to scalably approximate the number of clusters in...
In hestiri/kluster: A package for scalable approximation of the number of clusters

Description Usage Arguments Value Examples

View source: R/kluster.R

kluster is the main kluster function. If an algorithm is not pre-defined, it will use the best implementation of kluster (most frequent product on BIC) for the production purpose. If a sample size is not pre-defined, it will use the recommended sample size (if n> 3000, sample size = 500, otherwise, sample size = 100) as default. If an iteration is not pre-set, it will iterate 100 times, as recommended through our simulation analyses.

1	kluster(data, iter_klust = 100, smpl = 100, algorithm = "BIC")

`data`
`iter_klust`	number of iterations for clustering with sample_n size x – present to 100
`smpl`	size of the sample_n to be taken with replacement out of data – preset to 100
`algorithm`	select analysis algorithm from BIC, PAMK, CAL, and AP – preset to BIC

returns the following values:

`sim`	returns both the most frequent and the average approximated number of clusters for the selected algorithm
`m_bic_k,m_cal_k,m_ap_k,m_pam_k`	the average approximated number of cluster for each selected algorithm
`f_bic_k,f_cal_k,f_ap_k,f_pam_k`	the most frequent approximated number of cluster for each selected algorithm

1
2
3

dat = read.csv("data/Breast_Cancer_Wisconsin.csv")
##returning kluster's most frequent product using the BIC algorithm:
k = kluster(data = dat[,c("area_mean","texture_mean")],iter_klust = 100, smpl=100, algorithm = "BIC")$f_bic_k