conclus: Perform consensus clustering

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Perform consensus clustering

Usage

1
2
conclus(diss, cluster = pamCons, subsample = 0.5, K = NULL, R = 100,
  verbose = FALSE, ncores = 1)

Arguments

diss

A dissimilarity matris as returned by, for example, dist or daisy.

cluster

A clustering function that takes 2 arguments: x and k and returns only the class memberships. Functions pamCons and hclustCons are two simple examples. The dissimilarity matrix diss will be passed into cluster so cluster should NOT coerce x to be a dissimilarity matrix.

subsample

The subsampling proportion. Defaults to subsample=0.5 and 50% subsampling is performed. If subsample == 1, bootstrap sampling (sampling with replacement) is performed.

K

The maximum number of clusters to identify. All values between 2 and K are used and the consensus clustering matrix returned for each.

R

The number of random subsamples to run. Defaults to R=100.

verbose

Whether to report progress. Defaults to verbose=FALSE.

ncores

The number of cores to use. Defaults to ncores=1 and it is often the case that conclus will run faster on a single core than when it makes the effort to parallelize. To have the function guess the number of cores, specify ncores=NULL.

Details

R random subsamples (or bootstrap samples if subsample = 1) are taken from the dissimilarity matrix, and clustering is performed for each value of k = 2, ..., K. For each value of k, the consensus matrix is computed; each entry (i, j) represents the average number of times items i and j were in the same cluster. As such, each element of M is on [0, 1], with 0 or 1 representing perfect consensus. If items of the concensus matrix are arranged according to cluster membership, perfect consensus would be represented by a block diagonal form with blocks full of 1s surrounded by 0s.

Value

An object of class ‘conclus’. It contains:

call

the function call;

M

a list, with one element for each k in 2:K, representing the consensus matrices;

membership

a matrix with one column, for each k in 2:K, representing the cluster memberships;

K

the values of k in 2:K;

cluster

the function used to perform the clustering on the subsamples.

Author(s)

Harry Southworth

See Also

pamCons, summary.conclus, representatives

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# The pluton data
cc <- conclus(dist(pluton), K=7) # default PAM clustering
ggplot(cc)
ggplot(summary(cc))
# Do the Gaussian3 and Unform1 examples from Monti et al
# First, they used average linkage, so define a new function
aveHclustCons <- function(x, k){
  stats::cutree(hclust(x, method="average"), k)
}
# Now pass it into conclus with the Gaussian3 data
ccg <- conclus(daisy(Gaussian3), K=6, cluster=aveHclustCons, subsample=.8, R=500, ncores=7)
ggplot(ccg, low="white", high="red")
s <- summary(ccg)
s
ggplot(s)
# Those are similar to Figures 2 and 3. Do the missing histogram
hist(ccg$M[[2]], col="red")

# Now Uniform 1
ccu <- conclus(daisy(Uniform1), K=6, cluster=aveHclustCons, subsample=.8, R=500, ncores=7)
ggplot(ccu, low="white", high="red")
su <- summary(ccu)
su
ggplot(su)
hist(c(ccu$M[[2]]), col="green")

harrysouthworth/conclus documentation built on May 24, 2019, 4:05 a.m.