conclus: Perform consensus clustering
In harrysouthworth/conclus: Consensus Clustering

Description Usage Arguments Details Value Author(s) See Also Examples

Perform consensus clustering

1 2	conclus(diss, cluster = pamCons, subsample = 0.5, K = NULL, R = 100, verbose = FALSE, ncores = 1)

`diss`	A dissimilarity matris as returned by, for example, `dist` or `daisy`.
`cluster`	A clustering function that takes 2 arguments: `x` and `k` and returns only the class memberships. Functions `pamCons` and `hclustCons` are two simple examples. The dissimilarity matrix `diss` will be passed into `cluster` so `cluster` should NOT coerce `x` to be a dissimilarity matrix.
`subsample`	The subsampling proportion. Defaults to `subsample=0.5` and 50% subsampling is performed. If `subsample == 1`, bootstrap sampling (sampling with replacement) is performed.
`K`	The maximum number of clusters to identify. All values between 2 and `K` are used and the consensus clustering matrix returned for each.
`R`	The number of random subsamples to run. Defaults to `R=100`.
`verbose`	Whether to report progress. Defaults to `verbose=FALSE`.
`ncores`	The number of cores to use. Defaults to `ncores=1` and it is often the case that `conclus` will run faster on a single core than when it makes the effort to parallelize. To have the function guess the number of cores, specify `ncores=NULL`.

R random subsamples (or bootstrap samples if subsample = 1) are taken from the dissimilarity matrix, and clustering is performed for each value of k = 2, ..., K. For each value of k, the consensus matrix is computed; each entry (i, j) represents the average number of times items i and j were in the same cluster. As such, each element of M is on [0, 1], with 0 or 1 representing perfect consensus. If items of the concensus matrix are arranged according to cluster membership, perfect consensus would be represented by a block diagonal form with blocks full of 1s surrounded by 0s.

An object of class ‘conclus’. It contains:

`call`	the function call;
`M`	a list, with one element for each k in 2:K, representing the consensus matrices;
`membership`	a matrix with one column, for each k in 2:K, representing the cluster memberships;
`K`	the values of k in 2:K;
`cluster`	the function used to perform the clustering on the subsamples.

Harry Southworth

pamCons, summary.conclus, representatives

# The pluton data
cc <- conclus(dist(pluton), K=7) # default PAM clustering
ggplot(cc)
ggplot(summary(cc))
# Do the Gaussian3 and Unform1 examples from Monti et al
# First, they used average linkage, so define a new function
aveHclustCons <- function(x, k){
  stats::cutree(hclust(x, method="average"), k)
}
# Now pass it into conclus with the Gaussian3 data
ccg <- conclus(daisy(Gaussian3), K=6, cluster=aveHclustCons, subsample=.8, R=500, ncores=7)
ggplot(ccg, low="white", high="red")
s <- summary(ccg)
s
ggplot(s)
# Those are similar to Figures 2 and 3. Do the missing histogram
hist(ccg$M[[2]], col="red")

# Now Uniform 1
ccu <- conclus(daisy(Uniform1), K=6, cluster=aveHclustCons, subsample=.8, R=500, ncores=7)
ggplot(ccu, low="white", high="red")
su <- summary(ccu)
su
ggplot(su)
hist(c(ccu$M[[2]]), col="green")