Similarity.measures | R Documentation |
Classical similarity measures between pairs of clusterings are implemented. These measures use the pairwise boolean membership matrix
(Do.boolean.membership.matrix
) to compute the similarity between two clusterings, using the matrix as a vector and computing
the result as an internal product. It may be shown that the same result may be obtained using contingency matrices and the classical
definition of Fowlkes and Mallows (implemented with the function sFM
), Jaccard (implemented with the function sJaccard
)
and Matching (Rand Index, implemented with the function sM
) coefficients.
Their values range from 0 to 1 (0 no similarity, 1 identity).
sFM(M1, M2)
sJaccard(M1, M2)
sM(M1, M2)
M1 |
boolean membership matrix representing the first clustering |
M2 |
boolean membership matrix representing the second clustering |
similarity measure between the two clusterings according to Fowlkes and Mallows (sFM
), Jaccard (sJaccard
) and
Matching (sM
) coefficients.
Giorgio Valentini valentini@di.unimi.it
Ben-Hur, A. Ellisseeff, A. and Guyon, I., A stability based method for discovering structure in clustered data, In: "Pacific Symposium on Biocomputing", Altman, R.B. et al (eds.), pp, 6-17, 2002.
Do.boolean.membership.matrix
library("clusterv")
library("stats")
library("cluster")
# Synthetic data set generation (3 clusters with 20 examples for each cluster)
M <- generate.sample3(n=20, m=2)
# k-means clustering with 3 clusters
r1<-kmeans(t(M), c=3, iter.max = 1000);
# this function is implemented in the clusterv package:
cl1 <- Transform.vector.to.list(r1$cluster);
# generation of a boolean membership square matrix:
Bkmeans <- Do.boolean.membership.matrix(cl1, 60, 1:60)
# the same as above, using PAM clustering with 3 clusters
d <- dist (t(M));
r2 <- pam (d,3,cluster.only=TRUE);
cl2 <- Transform.vector.to.list(r2);
BPAM <- Do.boolean.membership.matrix(cl2, 60, 1:60)
# computation of the Fowlkes and Mallows index between the k-means and the PAM clustering:
sFM(Bkmeans, BPAM)
# computation of the Jaccard index between the k-means and the PAM clustering:
sJaccard(Bkmeans, BPAM)
# computation of the Matching coefficient between the k-means and the PAM clustering:
sM(Bkmeans, BPAM)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.