Description Usage Arguments Value Rand Index Jaccard Index FowlkesMallows Author(s) References Examples
Compute the (adjusted) Rand, Jaccard and FowlkesMallows index for agreement of two partitions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15  comPart(x, y, type=c("ARI","RI","J","FM"))
## S4 method for signature 'flexclust,flexclust'
comPart(x, y, type)
## S4 method for signature 'numeric,numeric'
comPart(x, y, type)
## S4 method for signature 'flexclust,numeric'
comPart(x, y, type)
## S4 method for signature 'numeric,flexclust'
comPart(x, y, type)
randIndex(x, y, correct=TRUE, original=!correct)
## S4 method for signature 'table,missing'
randIndex(x, y, correct=TRUE, original=!correct)
## S4 method for signature 'ANY,ANY'
randIndex(x, y, correct=TRUE, original=!correct)

x 
Either a 2dimensional crosstabulation of cluster
assignments (for 
y 
An object inheriting from class

type 
character vector of abbreviations of indices to compute. 
correct, original 
Logical, correct the Rand index for agreement by chance? 
A vector of indices.
Let A denote the number of all pairs of data points which are either put into the same cluster by both partitions or put into different clusters by both partitions. Conversely, let D denote the number of all pairs of data points that are put into one cluster in one partition, but into different clusters by the other partition. The partitions disagree for all pairs D and agree for all pairs A. We can measure the agreement by the Rand index A/(A+D) which is invariant with respect to permutations of cluster labels.
The index has to be corrected for agreement by chance if the sizes of the clusters are not uniform (which is usually the case), or if there are many clusters, see Hubert \& Arabie (1985) for details.
If the number of clusters is very large, then usually the vast majority of pairs of points will not be in the same cluster. The Jaccard index tries to account for this by using only pairs of points that are in the same cluster in the defintion of A.
Let A again be the pairs of points that are in the same cluster in both partitions. FowlkesMallows divides this number by the geometric mean of the sums of the number of pairs in each cluster of the two partitions. This gives the probability that a pair of points which are in the same cluster in one partition are also in the same cluster in the other partition.
Friedrich Leisch
Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of Classification, 2, 193–218, 1985.
Marina Meila. Comparing clusterings  an axiomatic view. In Stefan Wrobel and Luc De Raedt, editors, Proceedings of the International Machine Learning Conference (ICML). ACM Press, 2005.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23  ## no class correlations: corrected Rand almost zero
g1 < sample(1:5, size=1000, replace=TRUE)
g2 < sample(1:5, size=1000, replace=TRUE)
tab < table(g1, g2)
randIndex(tab)
## uncorrected version will be large, because there are many points
## which are assigned to different clusters in both cases
randIndex(tab, correct=FALSE)
comPart(g1, g2)
## let pairs (g1=1,g2=1) and (g1=3,g2=3) agree better
k < sample(1:1000, size=200)
g1[k] < 1
g2[k] < 1
k < sample(1:1000, size=200)
g1[k] < 3
g2[k] < 3
tab < table(g1, g2)
## the index should be larger than before
randIndex(tab, correct=TRUE, original=TRUE)
comPart(g1, g2)

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
Please suggest features or report bugs in the GitHub issue tracker.
All documentation is copyright its authors; we didn't write any of that.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.