VI_fast: Variation of Information and Normalized Mutual Information
In Jakob-Bach/FastTSDistances: Fast dissimilarity computations for time series

Description Usage Arguments Details Value References

Calculates the Variation of Information index introduced by of Meila (2003) to compare two cluster assignment vectors (external cluster validation). It is a value greater or equal 0, lower values indicating more similarity (it is based on the entropy of the single assignments and the mutual information of the joint distribution). Optionally, the index can be normalized to [0,1] as proposed by Wu, Xiong and Chen (2009). After normalization, we take 1 - normalizedValue so that higher values indicate better clustering quality (as it is for indices like Rand, Fowlkes-Mallows); the result equals the Normalized Mutual Information of Fred and Jain (2002).

1	VI_fast(assignments1, assignments2, normalizeAndInvert = FALSE)

`assignments1`	Integer vector of cluster assignments containing only values from 1 to k with k = number of clusters (code depends on this!).
`assignments2`	Integer vector of cluster assignments containing only values from 1 to k with k = number of clusters.
`normalizeAndInvert`	Should the Variation of Information be normalized to [0,1] and inverted such that high values indicate similar clusterings?

We use the base 2 logarithm for calculating entropy and mutual information.

The Variation of Information as double (in [0, entropy1+entropy2] without normalization and [0,1] else).

Fred, A. L. & Jain, A. K. (2002). Data clustering using evidence accumulation. In Pattern recognition, 2002. proceedings. 16th international conference on (Vol. 4, pp. 276-280). IEEE.

Meila, M. (2003). Comparing clusterings by the variation of information. In B. Schölkopf & M. K. Warmuth (Eds.), Learning theory and kernel machines: 16th annual conference on learning theory and 7th kernel workshop, colt/kernel 2003, washington, dc, usa, august 24-27, 2003. proceedings (pp. 173-187). Springer Berlin Heidelberg.

Wu, J., Xiong, H. & Chen, J. (2009). Adapting the right measures for k-means clustering. In Proceedings of the 15th acm sigkdd international conference on knowledge discovery and data mining (pp. 877-886). ACM.

Jakob-Bach/FastTSDistances documentation built on May 13, 2019, 1:15 p.m.