VI_fast: Variation of Information and Normalized Mutual Information

Description Usage Arguments Details Value References

Description

Calculates the Variation of Information index introduced by of Meila (2003) to compare two cluster assignment vectors (external cluster validation). It is a value greater or equal 0, lower values indicating more similarity (it is based on the entropy of the single assignments and the mutual information of the joint distribution). Optionally, the index can be normalized to [0,1] as proposed by Wu, Xiong and Chen (2009). After normalization, we take 1 - normalizedValue so that higher values indicate better clustering quality (as it is for indices like Rand, Fowlkes-Mallows); the result equals the Normalized Mutual Information of Fred and Jain (2002).

Usage

1
VI_fast(assignments1, assignments2, normalizeAndInvert = FALSE)

Arguments

assignments1

Integer vector of cluster assignments containing only values from 1 to k with k = number of clusters (code depends on this!).

assignments2

Integer vector of cluster assignments containing only values from 1 to k with k = number of clusters.

normalizeAndInvert

Should the Variation of Information be normalized to [0,1] and inverted such that high values indicate similar clusterings?

Details

We use the base 2 logarithm for calculating entropy and mutual information.

Value

The Variation of Information as double (in [0, entropy1+entropy2] without normalization and [0,1] else).

References

Fred, A. L. & Jain, A. K. (2002). Data clustering using evidence accumulation. In Pattern recognition, 2002. proceedings. 16th international conference on (Vol. 4, pp. 276-280). IEEE.

Meila, M. (2003). Comparing clusterings by the variation of information. In B. Schölkopf & M. K. Warmuth (Eds.), Learning theory and kernel machines: 16th annual conference on learning theory and 7th kernel workshop, colt/kernel 2003, washington, dc, usa, august 24-27, 2003. proceedings (pp. 173-187). Springer Berlin Heidelberg.

Wu, J., Xiong, H. & Chen, J. (2009). Adapting the right measures for k-means clustering. In Proceedings of the 15th acm sigkdd international conference on knowledge discovery and data mining (pp. 877-886). ACM.


Jakob-Bach/FastTSDistances documentation built on May 13, 2019, 1:15 p.m.