Automatic comparison of clustering methods.

Share:

Description

Automatically compute different clustering solutions and associated quality measures to help identifying the best one.

Usage

1
2
3
4
5
6
7
8
9
wcCmpCluster(diss, weights = NULL, maxcluster, method = "all", pam.combine = TRUE)
## S3 method for class 'clustrangefamily'
print(x, max.rank=1, ...)
## S3 method for class 'clustrangefamily'
summary(object, max.rank=1, ...)
## S3 method for class 'clustrangefamily'
plot(x, group="stat", method="all", pam.combine=FALSE, 
    stat="noCH", norm="none", withlegend=TRUE, lwd=1, col=NULL, legend.prop=NA, 
	rows=NA, cols=NA, main=NULL, xlab="", ylab="", ...)

Arguments

diss

A dissimilarity matrix or a dist object (see dist).

weights

Optional numerical vector containing weights.

maxcluster

Integer. Maximum number of cluster. The range will include all clustering solution starting from two to ncluster.

method

A vector of hierarchical clustering methods to compute or "all" for all methods. Possible values include "ward", "single", "complete", "average", "mcquitty", "median", "centroid" (using hclust), "pam" (using wcKMedRange), "diana" (only for unweighted datasets using diana), "beta.flexible" (only for unweighted datasets using agnes)

pam.combine

Logical. Should we try all combinations of hierarchical and PAM clustering?

x

A clustrangefamily object to plot or print

object

A clustrangefamily object to summarize

max.rank

Integer. The different number of solution to print/summarize

group

One of "stat" or "method". If "stat", plots are grouped by statistics, otherwise by clustering methods.

stat

Character. The list of statistics to plot or "noCH" to plot all statistics except "CH" and "CHsq" or "all" for all statistics. See wcClusterQuality for a list of possible values. It is also possible to use "RHC" to plot the quality measure 1-HC. Unlike HC, RHC should be maximized as all other quality measures.

norm

Character. Normalization method of the statistics can be one of "none" (no normalization), "range" (given as (value -min)/(max-min), "zscore" (adjusted by mean and standard deviation) or "zscoremed" (adjusted by median and median of the difference to the median).

withlegend

Logical. If FALSE, the legend is not plotted.

lwd

Numeric. Line width, see par.

col

A vector of line colors, see par. If NULL, a default set of color is used.

legend.prop

When withlegend=TRUE, sets the proportion of the graphic area used for plotting the legend. Default value is set according to the place (bottom or right of the graphic area) where the legend is plotted. Values from 0 to 1.

rows,cols

optional arguments to arrange plots.

xlab

x axis label.

ylab

y axis label.

main

main title of the plot.

...

Additionnal parameters passed to lines.

Value

An object of class clustrangefamily with the following elements:

Method name:

the results of as.clustrange objects under each method name (see argument method for a list of possible values)

allstats:

A matrix containing the clustering statistics for each cluster solution and method.

param:

The parameters set when the function was called.

See Also

See Also as.clustrange

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
data(mvad)

#Creating state sequence object
mvad.seq <- seqdef(mvad[, 17:86])

# COmpute distance using Hamming distance
diss <- seqdist(mvad.seq, method="HAM")

#Ward clustering
allClust <- wcCmpCluster(diss, maxcluster=15, method=c("average", "pam", "beta.flexible"), 
                         pam.combine=FALSE)

summary(allClust, max.rank=3)

##Plot PBC, RHC and ASW
plot(allClust, stat=c("PBC", "RHC", "ASW"), norm="zscore", lwd=2)


##Plot PBC, RHC and ASW grouped by cluster method
plot(allClust, group="method", stat=c("PBC", "RHC", "ASW"), norm="zscore", lwd=2)