Description Usage Arguments Details Value References See Also
Calculates the conditional Shannon entropy to compare two cluster assignment vectors (external cluster validation). It is a value greater or equal 0, lower values indicating more similarity (purer clusters). Optionally, the index can be normalized to [0,1] and we take 1 - normalized entropy to get a uniformity measure where high values are good.
1 | conditionalEntropy_fast(assignments, groundTruth, normalizeAndInvert = FALSE)
|
assignments |
Integer vector of cluster assignments containing only values from 1 to k with k = number of clusters (code depends on this!). |
groundTruth |
Integer vector of true class labels containing only values from 1 to k' with k' = number of classes. |
normalizeAndInvert |
Should the entropy be normalized to [0,1] and inverted such that high values indicate similar clusterings? |
Be aware that this measure is asymmetric (classes are conditioned on/ analyzed
in) clusters and can still be high if the classes of the ground truth are
split up into multiple (but pure) clusters. Wu, Xiong and Chen (2009) propose
to use the symmetric variation of information (VI_fast
) instead,
which is also based on entropy.
We use the base 2 logarithm for calculating entropy.
The conditional entropy as double in [0, k'] (without normalization) or a uniformity measure in [0,1] (with normalization).
Wu, J., Xiong, H. & Chen, J. (2009). Adapting the right measures for k-means clustering. In Proceedings of the 15th acm sigkdd international conference on knowledge discovery and data mining (pp. 877-886). ACM.
Other External Cluster Validity Indices: fowlkesMallows_fast
,
pairCVIParameters_fast
,
phi_fast
, purity_fast
,
randIndex_fast
,
vanDongen_fast
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.