cl_boot: Bootstrap Resampling of Clustering Algorithms

Description Usage Arguments Details Value Examples

View source: R/boot.R

Description

Generate bootstrap replicates of the results of applying a “base” clustering algorithm to a given data set.

Usage

1
2
3
cl_boot(x, B, k = NULL,
        algorithm = if (is.null(k)) "hclust" else "kmeans", 
        parameters = list(), resample = FALSE)

Arguments

x

the data set of objects to be clustered, as appropriate for the base clustering algorithm.

B

an integer giving the number of bootstrap replicates.

k

NULL (default), or an integer giving the number of classes to be used for a partitioning base algorithm.

algorithm

a character string or function specifying the base clustering algorithm.

parameters

a named list of additional arguments to be passed to the base algorithm.

resample

a logical indicating whether the data should be resampled in addition to “sampling from the algorithm”. If resampling is used, the class memberships of the objects given in x are predicted from the results of running the base algorithm on bootstrap samples of x.

Details

This is a rather simple-minded function with limited applicability, and mostly useful for studying the effect of (uncontrolled) random initializations of fixed-point partitioning algorithms such as kmeans or cmeans, see the examples. To study the effect of varying control parameters or explicitly providing random starting values, the respective cluster ensemble has to be generated explicitly (most conveniently by using replicate to create a list lst of suitable instances of clusterings obtained by the base algorithm, and using cl_ensemble(list = lst) to create the ensemble).

Value

A cluster ensemble of length B, with either (if resampling is not used, default) the results of running the base algorithm on the given data set, or (if resampling is used) the memberships for the given data predicted from the results of running the base algorithm on bootstrap samples of the data.

Examples

1
2
3
4
5
6
## Study e.g. the effect of random kmeans() initializations.
data("Cassini")
pens <- cl_boot(Cassini$x, 15, 3)
diss <- cl_dissimilarity(pens)
summary(c(diss))
plot(hclust(diss))

Example output

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   5.099  25.020  18.020  30.692  30.887 

clue documentation built on April 23, 2018, 5:04 p.m.