Bagging for Clustering

Share:

Description

Construct partitions of objects by running a base clustering algorithm on bootstrap samples from a given data set, and “suitably” aggregating these primary partitions.

Usage

1
2
cl_bag(x, B, k = NULL, algorithm = "kmeans", parameters = NULL, 
       method = "DFBC1", control = NULL)

Arguments

x

the data set of objects to be clustered, as appropriate for the base clustering algorithm.

B

an integer giving the number of bootstrap replicates.

k

NULL (default), or an integer giving the number of classes to be used for a partitioning base algorithm.

algorithm

a character string or function specifying the base clustering algorithm.

parameters

a named list of additional arguments to be passed to the base algorithm.

method

a character string indicating the bagging method to use. Currently, only method "DFBC1" is available, which implements algorithm BagClust1 in Dudoit & Fridlyand (2003).

control

a list of control parameters for the aggregation. Currently, not used.

Details

Bagging for clustering is really a rather general conceptual framework than a specific algorithm. If the primary partitions generated in the bootstrap stage form a cluster ensemble (so that class memberships of the objects in x can be obtained), consensus methods for cluster ensembles (as implemented, e.g., in cl_consensus and cl_medoid) can be employed for the aggregation stage. In particular, (possibly new) bagging algorithms can easily be realized by directly running cl_consensus on the results of cl_boot.

In BagClust1, aggregation proceeds by generating a reference partition by running the base clustering algorithm on the whole given data set, and averaging the ensemble memberships after optimally matching them to the reference partition (in fact, by minimizing Euclidean dissimilarity, see cl_dissimilarity).

If the base clustering algorithm yields prototypes, aggregation can be based on clustering these. This is the idea underlying the “Bagged Clustering” algorithm introduced in Leisch (1999) and implemented by function bclust in package e1071.

Value

An R object representing a partition of the objects given in x.

References

S. Dudoit and J. Fridlyand (2003). Bagging to improve the accuracy of a clustering procedure. Bioinformatics, 19/9, 1090–1099. \Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("10.1093/bioinformatics/btg038")}.

F. Leisch (1999). Bagged Clustering. Working Paper 51, SFB “Adaptive Information Systems and Modeling in Economics and Management Science”. epub.wu.ac.at/1272/.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
set.seed(1234)
## Run BagClust1 on the Cassini data.
data("Cassini")
party <- cl_bag(Cassini$x, 50, 3)
plot(Cassini$x, col = cl_class_ids(party), xlab = "", ylab = "")
## Actually, using fuzzy c-means as a base learner works much better:
if(require("e1071", quiet = TRUE)) {
    party <- cl_bag(Cassini$x, 20, 3, algorithm = "cmeans")
    plot(Cassini$x, col = cl_class_ids(party), xlab = "", ylab = "")
}

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.