bootCVD: Cluster Solution Diagnositics Using Bootstrap Replicates
In BCA: Business and Customer Analytics

Description Usage Arguments Details Value Author(s) References See Also

Provides a plot of both the Rand index and the Calinski-Harabas index for different numbers of clusters for a common underlying dataset using either the K-Means, K-Medians, or Neural Gas clusting algorithms based on a set of bootstrap replicates of the data.

bootCVD(x, k, nboot=100, nrep=1, method = c("kmn", "kmd", "neuralgas"),
   col1, col2, dsname)
bootCH(xdat, k_vals, clstr1, clstr2, cntrs1, cntrs2,
   method = c("kmn", "kmd", "neuralgas"))
bootPlot(fc, ch, col1="blue", col2="green")

`x`	A numeric matrix of the data to be clustered.
`k`	An integer vector giving the set of clustering solutions to be examined.
`nboot`	The number of bootstrap replicates to use for the assessment.
`nrep`	The number of each set of initial cluster seeds on which to base a solution.
`method`	The clustering method, one of "kmn" (K-Means), "kmd" (K-Medians), and "neuralgas" (neural gas).
`col1`	The color to use for the plot of the Rand index values.
`col2`	The color to use for the plot of the Calinski-Harabas index values.
`dsname`	The name of the dataset being used (used only for output purposes.
`xdat`	A numeric matrix of the data to be clustered.
`k_vals`	An integer vector giving the set of clustering solutions to be examined.
`clstr1`	The cluster assignments from a bootFlexclust object for one side of the Rand index paired comparisons.
`clstr2`	The cluster assignments from a bootFlexclust object for the other side of the Rand index paired comparisons.
`cntrs1`	The cluster centers from a bootFlexclust object for one side of the bootFlexclust Rand index paired comparisons.
`cntrs2`	The cluster centers from a bootFlexclust object for the other side of the bootFlexclust Rand index paired comparisons.
`fc`	A bootFlexclust object.
`ch`	A matrix of Calinski-Harabas index values from `bootCH`.

The Rand index provides a measure of cluster stability, with relatively higher values indicating relatively more stable clusters, and the the Calinski-Harabas index gives a ratio of cluster seperation to cluster homogeneity, with higher values of the index being comparatively more preferred. The use of bootstrap replicates addresses both potential randomness in both the sample data and the clustering algorithms.

The functions bootCVD and bootPlot return invisibly. Their benefit is the side effect plot produced and the printed summary of the index values. The function bootCH a matrix of Calinski-Harabas index values, the rows are the replicates, and each column corresponds to a particular number of clusters for a solution.

Dan Putler

S. Dolnicar, F. Leisch (2010), Evaluation of Structure and Reproducibility of Cluster Solution Using the Bootstrap. Marketing Letters, 21:1.

F. Leisch (2006), A Toolbox for K-Centroids Cluster Analysis. Computational Statistics and Data Analysis, 51:2.

bootFlexclust

BCA documentation built on May 2, 2019, 1:26 p.m.