kluster: a function to scalably approximate the number of clusters in...

Description Usage Arguments Value Examples

View source: R/kluster.R

Description

kluster is the main kluster function. If an algorithm is not pre-defined, it will use the best implementation of kluster (most frequent product on BIC) for the production purpose. If a sample size is not pre-defined, it will use the recommended sample size (if n> 3000, sample size = 500, otherwise, sample size = 100) as default. If an iteration is not pre-set, it will iterate 100 times, as recommended through our simulation analyses.

Usage

1
kluster(data, iter_klust = 100, smpl = 100, algorithm = "BIC")

Arguments

data
iter_klust

number of iterations for clustering with sample_n size x – present to 100

smpl

size of the sample_n to be taken with replacement out of data – preset to 100

algorithm

select analysis algorithm from BIC, PAMK, CAL, and AP – preset to BIC

Value

returns the following values:

sim

returns both the most frequent and the average approximated number of clusters for the selected algorithm

m_bic_k,m_cal_k,m_ap_k,m_pam_k

the average approximated number of cluster for each selected algorithm

f_bic_k,f_cal_k,f_ap_k,f_pam_k

the most frequent approximated number of cluster for each selected algorithm

Examples

1
2
3
dat = read.csv("data/Breast_Cancer_Wisconsin.csv")
##returning kluster's most frequent product using the BIC algorithm:
k = kluster(data = dat[,c("area_mean","texture_mean")],iter_klust = 100, smpl=100, algorithm = "BIC")$f_bic_k

hestiri/kluster documentation built on May 28, 2019, 8:55 p.m.