Description Usage Arguments Details Value Author(s) References See Also Examples
Given two clusterings, this function calculates the Variation of Information between them. Variation of Information can be thought of as what information content is lost and what information content is gained by choosing one clustering over the other clustering.
1 | variationInformation(clustering1, clustering2)
|
clustering1 |
A clustering as defined by the cluster
output functions of this package. See |
clustering2 |
same as clustering 1. |
The Variation of Information (from Meila) is a quantity based on the entropy H(C) of a clustering C, the entropy H(C') of the second clustering C', and the mutual information I(C,C') between these two clusterings. See Meila for more information on how entropy and mutual information are calculated.
Variation of Information can be thought of as what information content is lost and what information content is gained by choosing one clustering over the other clustering. Unlike more conventional methods of counting cluster membership (such as Jaccard Index), the value can be interpreted when there are different numbers of clusters in the two clusterings.
The Variation of information is calculated by the following equation: H(C) + H(C') - 2I(C, C')
variation |
Variation of Information. A value that ranges from 0-1. The more the clusterings are in agreement, the closer this value is to 0. |
Ted Laderas (laderast@ohsu.edu)
Meila, M. Comparing Clusterings. Technical Report 418. 2002, University of Washington Statistics Department: Seattle.
1 2 3 4 5 6 | data(choresults)
clusts <- choresults$clusters
#calculate Variation of Information from clusterings
variationInformation(as.data.frame(clusts[[c("UPGMACOR")]]),
as.data.frame(clusts[[c("UPGMAEUC")]]))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.