Calculate purity of the clustering results. For example, see \insertCiteSchaeffer_etal_2016_trust;textualfuntimes.

1 | ```
purity(classes, clusters)
``` |

`classes` |
a vector with labels of true classes. |

`clusters` |
a vector with labels of assigned clusters for which purity is to
be tested. Should be of the same length as |

Following \insertCiteManning_etal_2008;textualfuntimes, each cluster is assigned to the class which is most frequent in the cluster, then

*Purity(Ω,C) = \frac{1}{N}∑_{k}\max_{j}|ω_k\cap c_j|,*

where *Ω=\{ω_1,…,ω_K \}* is the set of identified
clusters and *C=\{c_1,…,c_J\}* is the set of classes. That is, within
each class *j=1,…,J* find the size of the most populous cluster from
the *K-j* unassigned clusters. Then, sum together the *\min(K,J)* sizes
found and divide by *N*,
where *N* = `length(classes)`

= `length(clusters)`

.

If *\max_{j}|ω_k\cap c_j|* is not unique for some *j*,
it is assigned to the class which second maximum is the smallest, to
maximize the *Purity* (see ‘Examples’).

Number of unique elements
in `classes`

and `clusters`

may differ.

A list with two elements:

`pur` |
purity value. |

`out` |
table with |

Vyacheslav Lyubchich

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | ```
# Fix seed for reproducible simulations:
# RNGkind(sample.kind = "Rounding") #run this line to have same seed across R versions > R 3.6.0
set.seed(1)
##### Example 1
#Create some classes and cluster labels:
classes <- rep(LETTERS[1:3], each = 5)
clusters <- sample(letters[1:5], length(classes), replace = TRUE)
#From the table below:
# - cluster 'b' corresponds to class A;
# - either of the clusters 'd' and 'e' can correspond to class B,
# however, 'e' should be chosen, because cluster 'd' also highly
# intersects with Class C. Thus,
# - cluster 'd' corresponds to class C.
table(classes, clusters)
## clusters
##classes a b c d e
## A 0 3 1 0 1
## B 1 0 0 2 2
## C 1 2 0 2 0
#The function does this choice automatically:
purity(classes, clusters)
#Sample output:
##$pur
##[1] 0.4666667
##
##$out
## ClassLabels ClusterLabels ClusterSize
##1 A b 3
##2 B e 2
##3 C d 2
##### Example 2
#The labels can be also numeric:
classes <- rep(1:5, each = 3)
clusters <- sample(1:3, length(classes), replace = TRUE)
purity(classes, clusters)
``` |

