cluster_stability: Bootstrap Cluster Stability

Description Usage Arguments Value Author(s) References Examples

View source: R/cluster_stability.R

Description

Computes cluster the stability for different values of k via the non-parametric bootstrap.

Usage

1

Arguments

dist

p x p distance matrix, where p is the number of objects.

kseq

A sequence of cluster sizes, for which cluster stability should be computed.

B

Number of bootstrap samples

norm

Default is FALSE and corresponds to the method by Fang & Wang, 2012.

...

Additional arguments passed to hclust.

Value

The function returns a vector of cluster instability indices, one for each k in kseq.

Author(s)

Jonas Haslbeck <jonashaslbeck@gmail.com>

References

Fang, Y., & Wang, J. (2012). Selection of the number of clusters via the bootstrap method. Computational Statistics & Data Analysis, 56(3), 468-477.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## Not run: 

# simple Gaussian mixture
data <- c(rnorm(100,0,1), 
          rnorm(100,5,1), 
          rnorm(100,10,1))
hist(data, breaks=40) # look at mixture

# compute distance matrix
dist <- as.matrix(dist(data))

kseq <- 2:10 # define k sequence of interest
set.seed(1) # make reproducible
instobj <- cluster_stability(dist, kseq, B=25)

# visualize instability as a function of k:
plot(kseq, instobj, ylim=c(0,.15), type='l', 
     xlab='k', ylab='Cluster Instability')

# correctly identifies k=3!


## End(Not run)

jmbh/mta documentation built on May 19, 2019, 1:51 p.m.