Description Usage Arguments Details Value References Examples

Let `x`

and `y`

represent two partitions of a set of `n`

elements into *K* and *L*, respectively,
nonempty and pairwise disjoint subsets,
e.g., two clusterings of a dataset with `n`

observations
represented as label vectors.
These functions quantify the similarity between `x`

and `y`

. They can be used as external cluster
validity measures, i.e., in the presence of reference (ground-truth)
partitions.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ```
adjusted_rand_score(x, y = NULL)
rand_score(x, y = NULL)
adjusted_fm_score(x, y = NULL)
fm_score(x, y = NULL)
mi_score(x, y = NULL)
normalized_mi_score(x, y = NULL)
adjusted_mi_score(x, y = NULL)
normalized_accuracy(x, y = NULL)
pair_sets_index(x, y = NULL)
``` |

`x` |
an integer vector of length n (or an object coercible to)
representing a K-partition of an n-set,
or a confusion matrix with K rows and L columns (see |

`y` |
an integer vector of length n (or an object coercible to) representing an L-partition of the same set), or NULL (if x is an K*L confusion matrix) |

Every index except `mi_score()`

(which computes the mutual
information score) outputs 1 given two identical partitions.
Note that partitions are always defined up to a bijection of the set of
possible labels, e.g., (1, 1, 2, 1) and (4, 4, 2, 4)
represent the same 2-partition.

`rand_score()`

gives the Rand score (the "probability" of agreement
between the two partitions) and
`adjusted_rand_score()`

is its version corrected for chance,
see (Hubert, Arabie, 1985),
its expected value is 0.0 given two independent partitions.
Due to the adjustment, the resulting index might also be negative
for some inputs.

Similarly, `fm_score()`

gives the Fowlkes-Mallows (FM) score
and `adjusted_fm_score()`

is its adjusted-for-chance version,
see (Hubert, Arabie, 1985).

Note that both the (unadjusted) Rand and FM scores are bounded from below
by *1/(K+1)* if *K=L*, hence their adjusted versions are preferred.

`mi_score()`

, `adjusted_mi_score()`

and
`normalized_mi_score()`

are information-theoretic
scores, based on mutual information,
see the definition of *AMI_{sum}* and *NMI_{sum}*
in (Vinh et al., 2010).

`normalized_accuracy()`

is defined as
*(Accuracy(C_σ)-1/L)/(1-1/L)*, where *C_σ* is a version
of the confusion matrix for given `x`

and `y`

,
*K ≤q L*, with columns permuted based on the solution to the
Maximal Linear Sum Assignment Problem.
*Accuracy(C_σ)* is sometimes referred to as Purity,
e.g., in (Rendon et al. 2011).

`pair_sets_index()`

gives the Pair Sets Index (PSI)
adjusted for chance (Rezaei, Franti, 2016), *K ≤q L*.
Pairing is based on the solution to the Linear Sum Assignment Problem
of a transformed version of the confusion matrix.

A single real value giving the similarity score.

Hubert L., Arabie P., Comparing Partitions, Journal of Classification 2(1), 1985, 193-218, esp. Eqs. (2) and (4).

Rendon E., Abundez I., Arizmendi A., Quiroz E.M., Internal versus external cluster validation indexes, International Journal of Computers and Communications 5(1), 2011, 27-34.

Rezaei M., Franti P., Set matching measures for external cluster validity, IEEE Transactions on Knowledge and Data Mining 28(8), 2016, 2173-2186.

Vinh N.X., Epps J., Bailey J., Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance, Journal of Machine Learning Research 11, 2010, 2837-2854.

1 2 3 4 5 6 7 8 9 10 11 | ```
y_true <- iris[[5]]
y_pred <- kmeans(as.matrix(iris[1:4]), 3)$cluster
adjusted_rand_score(y_true, y_pred)
rand_score(table(y_true, y_pred)) # the same
adjusted_fm_score(y_true, y_pred)
fm_score(y_true, y_pred)
mi_score(y_true, y_pred)
normalized_mi_score(y_true, y_pred)
adjusted_mi_score(y_true, y_pred)
normalized_accuracy(y_true, y_pred)
pair_sets_index(y_true, y_pred)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.