purity_fast: Purity Measure

Description Usage Arguments Details Value References See Also

Description

Calculates the purity measure (e.g. described by Wu, Xiong and Chen (2009)) to compare two cluster assignment vectors (external cluster validation). It is a value in (0,1], higher values indicating more similarity. It finds the most common ground truth class in each cluster and sums over these relative frequencies.

Usage

1
purity_fast(assignments, groundTruth)

Arguments

assignments

Integer vector of cluster assignments containing only values from 1 to k with k = number of clusters (code depends on this!).

groundTruth

Integer vector of class (ground truth) assignments containing only values from 1 to k with k = number of clusters.

Details

Be aware that this measure is asymmetric and can still be high if the classes of the ground truth are split up into multiple (but pure) clusters. Wu, Xiong and Chen (2009) propose to use the symmetric van Dongen measure (vanDongen_fast) instead.

Value

The purity measure as double in (0,1].

References

Van Dongen, S. (2000). Performance criteria for graph clustering and markov cluster experiments. National Research Institute for Mathematics and Computer Science. Amsterdam.

Wu, J., Xiong, H. & Chen, J. (2009). Adapting the right measures for k-means clustering. In Proceedings of the 15th acm sigkdd international conference on knowledge discovery and data mining (pp. 877-886). ACM.

See Also

Other External Cluster Validity Indices: conditionalEntropy_fast, fowlkesMallows_fast, pairCVIParameters_fast, phi_fast, randIndex_fast, vanDongen_fast


Jakob-Bach/FastTSDistances documentation built on May 13, 2019, 1:15 p.m.