Do.similarity.matrix: Functions to compute a pairwise similarity matrix.
In clusterv: Assessment of Cluster Stability by Randomized Maps

Do.similarity.matrix

R Documentation

Functions to compute a pairwise similarity matrix.

Description

The elements of a similarity matrix represent the frequency by which each pair of examples belongs to the same cluster across multiple clusterings. These functions may also be used with clusterings with a variable number of clusters.

Usage

Do.similarity.matrix(l, dim.Sim.M)

Do.similarity.matrix.partition(l)

Arguments

`l`	list of clusterings. Each element is a list of clusters. Each cluster is a vector whose elements (integers) represent the examples
`dim.Sim.M`	dimension of the similarity matrix (number of examples)

Details

A n \times n similarity matrix M to a k-clustering; the elements M_{ij} of M are defined as:

M_{ij} = \sum_{s=1}^k \chi_{A_s}[i] \cdot \chi_{A_s}[j]

where i,j \in \{1,2,\ldots,n\}, and \chi_{A_s} \in \{0,1\}^n is the characteristic vector of A_s \subseteq \{1,2,\ldots,n\}, i.e. \chi_{A_s}[i] = 1 if i \in A_s, otherwise \chi_{A_s}[i] = 0. If the k-clustering identifies a partition, M_{ij} \in \{0,1\}: in other words, M_{ij} denotes if elements i and j belong to the same cluster. Consider also a random projection \mu : \mathcal{R}^d \rightarrow \mathcal{R}^{d'}. Then a similarity matrix M can be computed averaging among multiple clusterings obtained from multiple random projections. This similarity matrix represents how much pairs of projected examples belong to the same cluster averaging across the repeated random projections. Do.similarity.matrix can be used with clusterings that do not strictly define a partition (that is a specific example may belong to more than 1 cluster). Do.similarity.matrix.partition may be used only with clusterings that strictly define a partition.

Value

A pairwise similarity matrix whose elements represents how much 2 examples fall in the same cluster across multiple clusterings. Each element of the matrix is normalized so that its value is beween 0 and 1.

Author(s)

Giorgio Valentini valentini@di.unimi.it

Examples

# Computing the similarity matrix associated to 20 hierarchical clusterings 
# using Normal projections. 
M <- generate.sample0(n=10, m=2, sigma=2, dim=800)
l.norm <- Multiple.Random.hclustering (M, dim=100, pmethod="Norm", c=3, 
                                       hmethod="average", n=20)
Sim <- Do.similarity.matrix.partition(l.norm);
# The same as above, but with 30 hierarchical clusterings using PMO projections. 
l.PMO <- Multiple.Random.hclustering (M, dim=100, pmethod="PMO", c=3, 
                                      hmethod="average", n=30)
Sim.PMO <- Do.similarity.matrix.partition(l.norm);

clusterv documentation built on June 8, 2025, 10:21 a.m.