fastclara: FastCLARA

Description Usage Arguments Value References

View source: R/RcppExports.R

Description

Clustering Large Applications (CLARA) with the improvements, to increase scalability in the number of clusters. This variant will also default to twice the sample size, to improve quality. (Schubert and Rousseeuw, 2019)

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
fastclara(
  rdist,
  n,
  k,
  maxiter = 0L,
  initializer = "LAB",
  fasttol = 1,
  numsamples = 5L,
  sampling = 0.25,
  independent = FALSE,
  seed = 123456789L
)

Arguments

rdist

The distance matrix (lower triangular matrix, column wise storage)

n

The number of observations

k

The number of clusters to produce

maxiter

The maximum number of iterations (default: 0)

initializer

Initializer: either "BUILD" (used in classic PAM) or "LAB" (linear approximative BUILD)

fasttol

Tolerance for fast swapping behavior (may perform worse swaps). Default: 1.0, which means to perform any additional swap that gives an improvement. When set to 0, it will only execute an additional swap if it appears to be independent (i.e., the improvements resulting from the swap have not decreased when the first swap was executed).

numsamples

Number of samples to draw (i.e. iterations). Default: 5

sampling

Sampling rate. Default value: 80 + 4*k. (see Schubert and Rousseeuw, 2019) If less than 1, it is considered to be a relative value. e.g. N*0.10

independent

NOT Keep the previous medoids in the next sample. Default: FALSE

seed

Seed for random number generator. Default: 123456789

Value

KMedoids S4 class

References

Erich Schubert, Peter J. Rousseeuw "Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms" 2019 https://arxiv.org/abs/1810.05691


fastkmedoids documentation built on Jan. 22, 2021, 1:06 a.m.