nselectboot: Selection of the number of clusters via bootstrap
In fpc: Flexible Procedures for Clustering

nselectboot

R Documentation

Selection of the number of clusters via bootstrap

Description

Selection of the number of clusters via bootstrap as explained in Fang and Wang (2012). Several times 2 bootstrap samples are drawn from the data and the number of clusters is chosen by optimising an instability estimation from these pairs.

In principle all clustering methods can be used that have a CBI-wrapper, see clusterboot, kmeansCBI. However, the currently implemented classification methods are not necessarily suitable for all of them, see argument classification.

Usage

nselectboot(data,B=50,distances=inherits(data,"dist"),
                        clustermethod=NULL,
                        classification="averagedist",centroidname = NULL,
                        krange=2:10, count=FALSE,nnk=1,
                        largeisgood=FALSE,...)

Arguments

`data`	something that can be coerced into a matrix. The data matrix - either an `np`-data matrix (or data frame) or an `nn`-dissimilarity matrix (or `dist`-object).
`B`	integer. Number of resampling runs.
`distances`	logical. If `TRUE`, the data is interpreted as dissimilarity matrix. If `data` is a `dist`-object, `distances=TRUE` automatically, otherwise `distances=FALSE` by default. This means that you have to set it to `TRUE` manually if `data` is a dissimilarity matrix.
`clustermethod`	an interface function (the function name, not a string containing the name, has to be provided!). This defines the clustering method. See the "Details"-section of `clusterboot` and `kmeansCBI` for the format. Clustering methods for `nselectboot` must have a `k`-argument for the number of clusters and must otherwise follow the specifications in `clusterboot`. Note that `nselectboot` won't work with CBI-functions that implicitly already estimate the number of clusters such as `pamkCBI`; use `claraCBI` if you want to run it for pam/clara clustering.
`classification`	string. This determines how non-clustered points are classified to given clusters. Options are explained in `classifdist` (if `distances=TRUE`) and `classifnp` (otherwise). Certain classification methods are connected to certain clustering methods. `classification="averagedist"` is recommended for average linkage, `classification="centroid"` is recommended for k-means, clara and pam (with distances it will work with `claraCBI` only), `classification="knn"` with `nnk=1` is recommended for single linkage and `classification="qda"` is recommended for Gaussian mixtures with flexible covariance matrices.
`centroidname`	string. Indicates the name of the component of `CBIoutput$result` that contains the cluster centroids in case of `classification="centroid"`, where `CBIoutput` is the output object of `clustermethod`. If `clustermethod` is `kmeansCBI` or `claraCBI`, centroids are recognised automatically if `centroidname=NULL`. If `centroidname=NULL` and `distances=FALSE`, cluster means are computed as the cluster centroids.
`krange`	integer vector; numbers of clusters to be tried.
`count`	logical. If `TRUE`, numbers of clusters and bootstrap runs are printed.
`nnk`	number of nearest neighbours if `classification="knn"`, see `classifdist` (if `distances=TRUE`) and `classifnp` (otherwise).
`largeisgood`	logical. If `TRUE`, output component `stabk` is taken as one minus the original instability value so that larger values of `stabk` are better.
`...`	arguments to be passed on to the clustering method.

Value

nselectboot returns a list with components kopt,stabk,stab.

`kopt`	optimal number of clusters.
`stabk`	mean instability values for numbers of clusters (or one minus this if `largeisgood=TRUE`).
`stab`	matrix of instability values for all bootstrap runs and numbers of clusters.

Author(s)

Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/

References

Fang, Y. and Wang, J. (2012) Selection of the number of clusters via the bootstrap method. Computational Statistics and Data Analysis, 56, 468-477.

Examples

  
  set.seed(20000)
  face <- rFace(50,dMoNo=2,dNoEy=0,p=2)
  nselectboot(dist(face),B=2,clustermethod=disthclustCBI,
   method="average",krange=5:7)
  nselectboot(dist(face),B=2,clustermethod=claraCBI,
   classification="centroid",krange=5:7)
  nselectboot(face,B=2,clustermethod=kmeansCBI,
   classification="centroid",krange=5:7)
# Of course use larger B in a real application.

fpc documentation built on Sept. 24, 2024, 9:07 a.m.