consClust: Crisp Consensus Clustering

View source: R/consClust.R

consClustR Documentation

Crisp Consensus Clustering

Description

Compute consensus clustering for different number of clusters (Monti, 2003). The function further computes cluster quality and consensus agreement measures.

Usage

consClust(diss,
          base.clust = "pam", 
          R = 100, 
          kvals = 2:15,
          cons.method = "SE", 
          membership = "crisp",
          k.fixed = TRUE, 
          agg.method = "cRand",
          keep.ensemble = TRUE,
          parallel = FALSE,
          progressbar = TRUE)

Arguments

diss

A dissimilarity matrix or a dist object.

base.clust

Character. Clustering algorithms used to compute the ensemble of partitions and hierarchies. May be a combination of "pam", "single", "complete", "average", "mcquitty", "ward.D", "ward.D2", "centroid", "median".

R

Numeric. The number of partitions or hierarchies to compute for consensus clustering.

kvals

Numeric vector. The number of clusters to compute, default 2:15

cons.method

Character. The consensus clustering method to use, can be one of "SE" (default), "HE", "SM", "HM", "GV1", "DWH", "GV3", "soft/symdiff", "hard/symdiff". See cl_consensus for details on the methods.

membership

Character. If "crisp", the consensus clustering is returned as vectors of crisp cluster labels. If "fuzzy" the function returns fuzzy membership martices.

k.fixed

Logical. If TRUE (default), the number of clusters obtained from the consensus cannot exceed the number of cluster in the partition ensemble.

agg.method

Character. The consensus agreement measures to compute, may be a comination of "cRand" (default), "Rand", "euclidean", "manhattan", "NMI", "KP", "angle", "diag", "FM", "Jaccard", "puritiy", "PS". See cl_agreement for details on the methods.

keep.ensemble

Logical. If TRUE (default) partitions and/or hierarchies are returned by the function. Setting keep.ensemble = FALSE saves memory.

parallel

Logical. Whether to initialize the parallel processing of the future package using the default multisession strategy. If FALSE (default), then the current plan is used. If TRUE, multisession plan is initialized using default values.

progressbar

Logical. Whether to initialize a progress bar using the future package. If FALSE (default), then the current progress bar handlers is used . If TRUE, a new global progress bar handlers is initialized.

Details

consClust relies on cl_consensus, to compute a consensus clustering among several internally computed hierarchies and partitions. The algorithm works as follows:

  1. An ensemble of R clusterings in a fixed number of groups k in kvals are computed on subsamples of the data. To reflect the potential data perculiarities, clustering are obtained on weighted subsamples using Baysian resampling.

  2. A consensus among the clusterings obtained in step one is computed. The number of clusters in the consensus may exceed the one in the ensemble of clusterings computed in step one. Setting k.fixed to TRUE set the maximal number of cluster in the consensus to the number of clusters k in the ensemble.

  3. Cluster quality indices are computed for the obtained consensus.

  4. Step 1 to 3 are repeated for each number of groups specified in kvals

Value

A consClust object with the following components:

clustering

The retained clustering for each number of groups.

stats

A matrix containing the clustering statistics of each cluster solution.

kvals

The number of computed clusters.

call

The used function calls.

ensemblePartitions

A list containing the partitions or hierarchies used to obtain the consensus.

References

Monti, S., Tamayo, P., Mesirov, J., Golub, T. (2003). Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning, 52, 1 \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1023/A:1023949509487")}

Unterlerchner, L., Studer, M. (2026). What are We Looking For? A Comparative Review of Clustering Algorithms and Cluster Quality Indices For Sequence Analysis. LIVES Working Papers 108 \Sexpr[results=rd]{tools:::Rd_expr_doi("10.12682/lives.2296-1658.2026.108")}

Examples

# Loading illustrative data
data(mvad)

# Creating state sequence object
mvad.seq <- seqdef(mvad[1:200, 17:86])

# Computing dissimilarities using LCS measure
diss <- seqdist(mvad.seq, method="LCS")

## Computing consensus clustering using PAM and Ward (D)

pamWardConsClust <- consClust(diss,
                              kvals = 2:6, 
                              base.clust = c("pam", "ward.D"),
                              R = 20,
                              k.fixed = TRUE,
                              agg.method = "cRand")

## Showing the cluster quality measures. 
pamWardConsClust

## Plotting normalized values for easier identification 
## of minimum and maximum values, with a transparent legend background.
plot(pamWardConsClust, norm="range")


# Plotting sequences in 6 groups
par(mar = c(2.5,2,1.8,1.2))

seqdplot(mvad.seq, 
         group = pamWardConsClust$clustering$cluster6, 
         border = NA)

WeightedCluster documentation built on April 27, 2026, 3:04 a.m.