Bagging for Clustering
Description
Construct partitions of objects by running a base clustering algorithm on bootstrap samples from a given data set, and “suitably” aggregating these primary partitions.
Usage
1 2 
Arguments
x 
the data set of objects to be clustered, as appropriate for the base clustering algorithm. 
B 
an integer giving the number of bootstrap replicates. 
k 

algorithm 
a character string or function specifying the base clustering algorithm. 
parameters 
a named list of additional arguments to be passed to the base algorithm. 
method 
a character string indicating the bagging method to
use. Currently, only method 
control 
a list of control parameters for the aggregation. Currently, not used. 
Details
Bagging for clustering is really a rather general conceptual framework
than a specific algorithm. If the primary partitions generated in the
bootstrap stage form a cluster ensemble (so that class memberships of
the objects in x
can be obtained), consensus methods for
cluster ensembles (as implemented, e.g., in cl_consensus
and cl_medoid
) can be employed for the aggregation
stage. In particular, (possibly new) bagging algorithms can easily be
realized by directly running cl_consensus
on the results
of cl_boot
.
In BagClust1, aggregation proceeds by generating a reference partition
by running the base clustering algorithm on the whole given data set,
and averaging the ensemble memberships after optimally matching them
to the reference partition (in fact, by minimizing Euclidean
dissimilarity, see cl_dissimilarity
).
If the base clustering algorithm yields prototypes, aggregation can be
based on clustering these. This is the idea underlying the
“Bagged Clustering” algorithm introduced in Leisch (1999) and
implemented by function bclust
in package
e1071.
Value
An R object representing a partition of the objects given in x
.
References
S. Dudoit and J. Fridlyand (2003). Bagging to improve the accuracy of a clustering procedure. Bioinformatics, 19/9, 1090–1099. \Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("10.1093/bioinformatics/btg038")}.
F. Leisch (1999). Bagged Clustering. Working Paper 51, SFB “Adaptive Information Systems and Modeling in Economics and Management Science”. epub.wu.ac.at/1272/.
Examples
1 2 3 4 5 6 7 8 9 10  set.seed(1234)
## Run BagClust1 on the Cassini data.
data("Cassini")
party < cl_bag(Cassini$x, 50, 3)
plot(Cassini$x, col = cl_class_ids(party), xlab = "", ylab = "")
## Actually, using fuzzy cmeans as a base learner works much better:
if(require("e1071", quiet = TRUE)) {
party < cl_bag(Cassini$x, 20, 3, algorithm = "cmeans")
plot(Cassini$x, col = cl_class_ids(party), xlab = "", ylab = "")
}
