bootstrapInstability: Fang and Wang's instability measure

Description Usage Arguments Value References

View source: R/boot_instability.R

Description

This method is based on the algorithm developped by Fang and Wang but with more choice regarding the instability measure. Their measure is equivalent to 1 - rand.index, here one can chose any normalized similarity measure and the instability will be 1 - similarity. For each number of clusters, several pair of bootstrap subsambles are selected and the instability measure is computed from the clustering of these pairs. The optimal number of clusters is the value for which the instability is the lowest.

Usage

1
2
bootstrapInstability(X, maxK, B = 50, clusterAlg = myKmean,
  similarity = adj.rand.index, verbose = TRUE, ...)

Arguments

X

data matrix or data frame of size n x d, n observations and d features

maxK

maximum number of clusters to evaluate

B

number of resampling iterations

clusterAlg

clustering algorithm. Its output must be a list containing attributs "cluster" and "predict". For more details, check the formatting of function myKmean.

similarity

function measuring the similarity between two partitions.

verbose

logical. If TRUE, plots the evolution of the algorithm

...

additional parameters for the clustering algorithm

Value

List with 3 components:

inst_mean

vector containing the mean instability measure for 2 to maxK cluster number

kopt

the optimal number of clusters

instability

matrix containing the instability measures for all cluster number and all subsampling iterations.

References

Fang, Y. and Wang, J. (2012). Selection of the number of clusters via the bootstrap method. Computational Statistics Data Analysis, 56:468-477.


mattmail/clusterAnalysis documentation built on Nov. 4, 2019, 6:18 p.m.