Description Usage Arguments Details Value References Examples
This function calculates Meila's (2007) Variation of Information (VI) metric between two clusterings of the same data set. VI is an information-theoretic criterion that measures the amount of information lost and gained between two clusterings.
1 | variation_information(labels1, labels2)
|
labels1 |
a vector of |
labels2 |
a vector of |
If n
is the number of observations in the data set, VI is bound
between 0 and log(n)
. Furthermore, VI == 0 if and only if the two
clusterings are the same.
The definition of VI, more properties, and connections to other criteria are given in the Meila (2007) paper, which has open access: http://www.sciencedirect.com/science/article/pii/S0047259X06002016
NOTE: We define 0 log 0 = 0.
the VI distance between labels1
and labels2
Meila, M. (2007). "Comparing clusterings - an information based distance," Journal of Multivariate Analysis, 98, 5, 873-895. http://www.sciencedirect.com/science/article/pii/S0047259X06002016
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # We generate K = 3 labels for each of n = 30 observations and compute the
# Variation of Information (VI) between the two clusterings.
set.seed(42)
K <- 3
n <- 30
labels1 <- sample.int(K, n, replace=TRUE)
labels2 <- sample.int(K, n, replace=TRUE)
variation_information(labels1, labels2)
# Here, we cluster the \code{\link{iris}} data set with the K-means and
# hierarchical algorithms using the true number of clusters, K = 3.
# Then, we compute the VI between the two clusterings.
iris_kmeans <- kmeans(iris[, -5], centers = 3)$cluster
iris_hclust <- cutree(hclust(dist(iris[, -5])), k = 3)
variation_information(iris_kmeans, iris_hclust)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.