conditionalEntropy_fast: Conditional Entropy to Compare Clusterings

Description Usage Arguments Details Value References See Also

Description

Calculates the conditional Shannon entropy to compare two cluster assignment vectors (external cluster validation). It is a value greater or equal 0, lower values indicating more similarity (purer clusters). Optionally, the index can be normalized to [0,1] and we take 1 - normalized entropy to get a uniformity measure where high values are good.

Usage

1
conditionalEntropy_fast(assignments, groundTruth, normalizeAndInvert = FALSE)

Arguments

assignments

Integer vector of cluster assignments containing only values from 1 to k with k = number of clusters (code depends on this!).

groundTruth

Integer vector of true class labels containing only values from 1 to k' with k' = number of classes.

normalizeAndInvert

Should the entropy be normalized to [0,1] and inverted such that high values indicate similar clusterings?

Details

Be aware that this measure is asymmetric (classes are conditioned on/ analyzed in) clusters and can still be high if the classes of the ground truth are split up into multiple (but pure) clusters. Wu, Xiong and Chen (2009) propose to use the symmetric variation of information (VI_fast) instead, which is also based on entropy.

We use the base 2 logarithm for calculating entropy.

Value

The conditional entropy as double in [0, k'] (without normalization) or a uniformity measure in [0,1] (with normalization).

References

Wu, J., Xiong, H. & Chen, J. (2009). Adapting the right measures for k-means clustering. In Proceedings of the 15th acm sigkdd international conference on knowledge discovery and data mining (pp. 877-886). ACM.

See Also

Other External Cluster Validity Indices: fowlkesMallows_fast, pairCVIParameters_fast, phi_fast, purity_fast, randIndex_fast, vanDongen_fast


Jakob-Bach/FastTSDistances documentation built on May 13, 2019, 1:15 p.m.