variation_info: Variation of Information Between Clusterings

View source: R/measures_clusterings.R

variation_infoR Documentation

Variation of Information Between Clusterings

Description

Computes the variation of information between two clusterings, such as a predicted and ground truth clustering.

Usage

variation_info(true, pred, base = exp(1))

Arguments

true

ground truth clustering represented as a membership vector. Each entry corresponds to an element and the value identifies the assigned cluster. The specific values of the cluster identifiers are arbitrary.

pred

predicted clustering represented as a membership vector.

base

base of the logarithm. Defaults to exp(1).

Details

Variation of information is an entropy-based distance metric on the space of clusterings. It is unnormalized and varies between 0 and \log(N) where N is the number of clustered elements. Larger values of the distance metric correspond to greater dissimilarity between the clusterings.

References

Arabie, P. and Boorman, S. A. "Multidimensional scaling of measures of distance between partitions." Journal of Mathematical Psychology 10:2, 148-203, (1973). \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/0022-2496(73)90012-6")}

Meilă, M. "Comparing Clusterings by the Variation of Information." In: Learning Theory and Kernel Machines, Lecture Notes in Computer Science 2777, Springer, Berlin, Heidelberg, (2003). \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-3-540-45167-9_14")}

Examples

true <- c(1,1,1,2,2)  # ground truth clustering
pred <- c(1,1,2,2,2)  # predicted clustering
variation_info(true, pred)


clevr documentation built on Sept. 16, 2023, 5:06 p.m.