Do.similarity.matrix | R Documentation |
The elements of a similarity matrix represent the frequency by which each pair of examples belongs to the same cluster across multiple clusterings. These functions may also be used with clusterings with a variable number of clusters.
Do.similarity.matrix(l, dim.Sim.M)
Do.similarity.matrix.partition(l)
l |
list of clusterings. Each element is a list of clusters. Each cluster is a vector whose elements (integers) represent the examples |
dim.Sim.M |
dimension of the similarity matrix (number of examples) |
A n \times n
similarity matrix M to a k-clustering; the elements M_{ij}
of M are
defined as:
M_{ij} = \sum_{s=1}^k \chi_{A_s}[i] \cdot \chi_{A_s}[j]
where i,j \in \{1,2,\ldots,n\}
, and \chi_{A_s} \in \{0,1\}^n
is the characteristic vector of
A_s \subseteq \{1,2,\ldots,n\}
,
i.e. \chi_{A_s}[i] = 1
if i \in A_s
, otherwise \chi_{A_s}[i] = 0
.
If the k-clustering identifies a partition, M_{ij} \in \{0,1\}
: in other words, M_{ij}
denotes if elements
i and j belong to the same cluster.
Consider also a random projection \mu : \mathcal{R}^d \rightarrow \mathcal{R}^{d'}
.
Then a similarity matrix M can be computed averaging among multiple clusterings obtained from multiple random
projections. This similarity matrix represents how much pairs of projected examples belong to the
same cluster averaging across the repeated random projections.
Do.similarity.matrix
can be used with clusterings that do not strictly define a partition (that is a specific
example may belong to more than 1 cluster). Do.similarity.matrix.partition
may be used only with clusterings that
strictly define a partition.
A pairwise similarity matrix whose elements represents how much 2 examples fall in the same cluster across multiple clusterings. Each element of the matrix is normalized so that its value is beween 0 and 1.
Giorgio Valentini valentini@di.unimi.it
# Computing the similarity matrix associated to 20 hierarchical clusterings
# using Normal projections.
M <- generate.sample0(n=10, m=2, sigma=2, dim=800)
l.norm <- Multiple.Random.hclustering (M, dim=100, pmethod="Norm", c=3,
hmethod="average", n=20)
Sim <- Do.similarity.matrix.partition(l.norm);
# The same as above, but with 30 hierarchical clusterings using PMO projections.
l.PMO <- Multiple.Random.hclustering (M, dim=100, pmethod="PMO", c=3,
hmethod="average", n=30)
Sim.PMO <- Do.similarity.matrix.partition(l.norm);
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.