Cluster quality statistics

Share:

Description

Compute several quality statistics of a given clustering solution.

Usage

1
wcClusterQuality(diss, clustering, weights = NULL)

Arguments

diss

A dissimilarity matrix or a dist object (see dist)

clustering

Factor. A vector of clustering membership.

weights

optional numerical vector containing weights.

Details

Compute several quality statistics of a given clustering solution. See value for details.

Value

A list with two elements stats and ASW:

stats

with the following statistics:

PBC

Point Biserial Correlation. Correlation between the given distance matrice and a distance which equal to zero for individuals in the same cluster and one otherwise.

HG

Hubert's Gamma. Same as previous but using Kendall's Gamma coefficient.

HGSD

Hubert's Gamma (Somers'D). Same as previous but using Somers' D coefficient.

ASW

Average Silhouette width (observation).

ASWw

Average Silhouette width (weighted).

CH

Calinski-Harabasz index (Pseudo F statistics computed from distances).

R2

Share of the discrepancy explained by the clustering solution.

CHsq

Calinski-Harabasz index (Pseudo F statistics computed from squared distances).

R2sq

Share of the discrepancy explained by the clustering solution (computed using squared distances).

HC

Hubert's C coefficient.

ASW:

The Average Silhouette Width of each cluster, one column for each ASW measure.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
data(mvad)
## Aggregating state sequence
aggMvad <- wcAggregateCases(mvad[, 17:86], weights=mvad$weight)

## Creating state sequence object
mvad.seq <- seqdef(mvad[aggMvad$aggIndex, 17:86], weights=aggMvad$aggWeights)
## Computing Hamming distance between sequence
diss <- seqdist(mvad.seq, method="HAM")

## KMedoids using PAMonce method (clustering only)
clust5 <- wcKMedoids(diss, k=5, weights=aggMvad$aggWeights, cluster.only=TRUE)

## Compute the silhouette of each observation
qual <- wcClusterQuality(diss, clust5, weights=aggMvad$aggWeights)

print(qual)