Cluster.validity: Validity indices computation
In clusterv: Assessment of Cluster Stability by Randomized Maps

Cluster.validity

R Documentation

Validity indices computation

Description

It computes the stability indices for each individual cluster, the overall validity index of the clustering and (optionally) the Assignment Confidence (AC) index for each example. To compute the indices a set of clusterings is used. It assumes that the label of the examples are integers.

Usage

Cluster.validity(cluster, M.clusters, AC = FALSE)

Cluster.validity.from.similarity(cluster, Sim.M, AC = TRUE)

Arguments

`cluster`	list of the clustering whose validity indices will be computed
`M.clusters`	list of the n clusterings (a list of lists) used for validity index computation
`Sim.M`	similarity matrix
`AC`	if it is TRUE the Assignment Confidence index for each example is computed

Details

Using the similarity matrix M, the stability index s for a cluster A is:

s(A) = \frac{1}{|A|(|A|-1)} \sum_{(i,j) \in A \times A, i\neq j} M_{ij}

The index s(A) estimates the stability of a cluster A by measuring how much the projections of the pairs (i,j) \in A \times A occur together in the same cluster in the projected subspaces. The stability index has values between 0 and 1: low values indicate no reliable clusters, high values denote stable clusters.

The overall validity of the clustering is the average between the validity indices of the individual clusters.

The Assignment-Confidence (AC) index estimates the confidence of the assignment of an example i to a cluster A using a similarity matrix M:

AC(i,A) = \frac{1}{|A|-1} \sum_{j \in A, j\neq i} M_{ij}

Using a set of realizations of a given randomized projection, the AC-index represents the frequency by which i appears with the other elements of the cluster A.

Value

a list with four components: "validity", "overall.validity", "similarity.matrix", "AC" (optional):

`validity`	vector with the validity of each of the clusters
`overall.validity`	validity index of the overall cluster
`similarity.matrix`	pairwise similarity matrix between examples
`AC`	matrix with the Assignment Confidence index for each example. Each row corresponds to an example, each column to a cluster

Author(s)

Giorgio Valentini valentini@di.unimi.it

Examples

# Computation of the validity indices for a hierarchical clustering 
M <- generate.sample0(n=10, m=1, sigma=1, dim=1000)
d <- dist (t(M)); 
tree <- hclust(d, method = "average");
plot(tree, main="");
cl.orig <- rect.hclust(tree, k = 3);
l.PMO <- Multiple.Random.hclustering (M, dim=100, pmethod="PMO", 
                                      c=3, hmethod="average", n=20)
list.indices <- Cluster.validity(cl.orig, l.PMO, AC = TRUE)
# Computation of the validity indices for a hierarchical clustering 
# with less defined clusters
M.less <- generate.sample0(n=10, m=1, sigma=2, dim=1000)
d <- dist (t(M.less)); 
tree.less <- hclust(d, method = "average");
plot(tree.less, main="");
cl.orig.less <- rect.hclust(tree.less, k = 3);
l.PMO.less <- Multiple.Random.hclustering (M.less, dim=100, pmethod="PMO", 
                                           c=3, hmethod="average", n=20)
list.indices.less <- Cluster.validity(cl.orig.less, l.PMO.less, AC = TRUE)

clusterv documentation built on June 8, 2025, 10:21 a.m.