View source: R/measures_clusterings.R
variation_info | R Documentation |
Computes the variation of information between two clusterings, such as a predicted and ground truth clustering.
variation_info(true, pred, base = exp(1))
true |
ground truth clustering represented as a membership vector. Each entry corresponds to an element and the value identifies the assigned cluster. The specific values of the cluster identifiers are arbitrary. |
pred |
predicted clustering represented as a membership vector. |
base |
base of the logarithm. Defaults to |
Variation of information is an entropy-based distance metric
on the space of clusterings. It is unnormalized and varies between
0
and \log(N)
where N
is the number of
clustered elements. Larger values of the distance metric correspond
to greater dissimilarity between the clusterings.
Arabie, P. and Boorman, S. A. "Multidimensional scaling of measures of distance between partitions." Journal of Mathematical Psychology 10:2, 148-203, (1973). \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/0022-2496(73)90012-6")}
Meilă, M. "Comparing Clusterings by the Variation of Information." In: Learning Theory and Kernel Machines, Lecture Notes in Computer Science 2777, Springer, Berlin, Heidelberg, (2003). \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-3-540-45167-9_14")}
true <- c(1,1,1,2,2) # ground truth clustering
pred <- c(1,1,2,2,2) # predicted clustering
variation_info(true, pred)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.