Standard External Measures: Rand index, Jaccard coefficient etc.

Share:

Description

Group of functions which compute standard external measures such as: Rand statistic and Folkes and Mallows index, Jaccard coefficient etc.

Usage

1
2
3
4
5
6
std.ext(clust1, clust2)
clv.Rand(external.ind)
clv.Jaccard(external.ind)
clv.Folkes.Mallows(external.ind)
clv.Phi(external.ind)
clv.Russel.Rao(external.ind)

Arguments

clust1

integer vector with information about cluster id the object is assigned to. If vector is not integer type, it will be coerced with warning.

clust2

integer vector with information about cluster id the object is assigned to. If vector is not integer type, it will be coerced with warning.

external.ind

vector or list with four values SS,SD,DS,DD which are result of function std.ext

Details

Two input vectors keep information about two different partitionings (let say P and P') of the same data set X. We refer to a pair of points (xi, xj) (we assume that i != j) from the data set using the following terms:

SS - number of pairs where both points belongs to the same cluster in both partitionings,
SD - number of pairs where both points belongs to the same cluster in partitioning P but in P' do not,
DS - number of pairs where in partitioning P both point belongs to different clusters but in P' do not,
DD - number of pairs where both objects belongs to different clusters in both partitionings.

Those values are used to compute (M = SS + SD + DS +DD):

Rand statistic R = (SS + DD)/M
Jaccard coefficient J = SS/(SS + SD + DS)
Folkes and Mallows index FM = sqrt(SS/(SS + SD))*sqrt(SS/(SS + DS))
Russel and Rao index RR = SS/M
Phi index Ph = (SS*DD - SD*DS)/((SS+SD)(SS+DS)(SD+DD)(DS+DD)).

Value

std.ext returns a list containing four values: SS, SD, DS, DD.
clv.Rand returns R value.
clv.Jaccard returns J value.
clv.Folkes.Mallows returns FM value.
clv.Phi returns Ph value.
clv.Russel.Rao returns RR value.

Author(s)

Lukasz Nieweglowski

References

G. Saporta and G. Youness Comparing two partitions: Some Proposals and Experiments. http://cedric.cnam.fr/PUBLIS/RC405.pdf

See Also

Other measures created to compare two partitionings: dot.product, similarity.index

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# load and prepare data
library(clv)
data(iris)
iris.data <- iris[,1:4]

# cluster data
pam.mod <- pam(iris.data,3) # create three clusters
v.pred <- as.integer(pam.mod$clustering) # get cluster ids associated to given data objects
v.real <- as.integer(iris$Species) # get also real cluster ids

# compare true clustering with those given by the algorithm
# 1. optimal solution:

# use only once std.ext function
std <- std.ext(v.pred, v.real)
# to compute three indicies based on std.ext result
rand1 <- clv.Rand(std)
jaccard1 <- clv.Jaccard(std)
folk.mal1 <- clv.Folkes.Mallows(std)

# 2. functional solution:

# prepare set of functions which compare two clusterizations
Rand <- function(clust1,clust2) clv.Rand(std.ext(clust1,clust2))
Jaccard <- function(clust1,clust2) clv.Jaccard(std.ext(clust1,clust2))
Folkes.Mallows <- function(clust1,clust2) clv.Folkes.Mallows(std.ext(clust1,clust2))

# compute indicies
rand2 <- Rand(v.pred,v.real)
jaccard2 <- Jaccard(v.pred,v.real)
folk.mal2 <- Folkes.Mallows(v.pred,v.real)