# standard_external_measures: Standard External Measures: Rand index, Jaccard coefficient... In clv: Cluster Validation Techniques

## Description

Group of functions which compute standard external measures such as: Rand statistic and Folkes and Mallows index, Jaccard coefficient etc.

## Usage

 ```1 2 3 4 5 6``` ```std.ext(clust1, clust2) clv.Rand(external.ind) clv.Jaccard(external.ind) clv.Folkes.Mallows(external.ind) clv.Phi(external.ind) clv.Russel.Rao(external.ind) ```

## Arguments

 `clust1` integer `vector` with information about cluster id the object is assigned to. If vector is not integer type, it will be coerced with warning. `clust2` integer `vector` with information about cluster id the object is assigned to. If vector is not integer type, it will be coerced with warning. `external.ind` `vector` or `list` with four values SS,SD,DS,DD which are result of function `std.ext`

## Details

Two input `vectors` keep information about two different partitionings (let say P and P') of the same data set X. We refer to a pair of points (xi, xj) (we assume that i != j) from the data set using the following terms:

 `SS` - number of pairs where both points belongs to the same cluster in both partitionings, `SD` - number of pairs where both points belongs to the same cluster in partitioning P but in P' do not, `DS` - number of pairs where in partitioning P both point belongs to different clusters but in P' do not, `DD` - number of pairs where both objects belongs to different clusters in both partitionings.

Those values are used to compute (M = SS + SD + DS +DD):

 Rand statistic R = (SS + DD)/M Jaccard coefficient J = SS/(SS + SD + DS) Folkes and Mallows index FM = sqrt(SS/(SS + SD))*sqrt(SS/(SS + DS)) Russel and Rao index RR = SS/M Phi index Ph = (SS*DD - SD*DS)/((SS+SD)(SS+DS)(SD+DD)(DS+DD)).

## Value

 `std.ext` returns a `list` containing four values: SS, SD, DS, DD. `clv.Rand` returns R value. `clv.Jaccard` returns J value. `clv.Folkes.Mallows` returns FM value. `clv.Phi` returns Ph value. `clv.Russel.Rao` returns RR value.

## Author(s)

Lukasz Nieweglowski

## References

G. Saporta and G. Youness Comparing two partitions: Some Proposals and Experiments. http://cedric.cnam.fr/PUBLIS/RC405.pdf

Other measures created to compare two partitionings: `dot.product`, `similarity.index`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31``` ```# load and prepare data library(clv) data(iris) iris.data <- iris[,1:4] # cluster data pam.mod <- pam(iris.data,3) # create three clusters v.pred <- as.integer(pam.mod\$clustering) # get cluster ids associated to given data objects v.real <- as.integer(iris\$Species) # get also real cluster ids # compare true clustering with those given by the algorithm # 1. optimal solution: # use only once std.ext function std <- std.ext(v.pred, v.real) # to compute three indicies based on std.ext result rand1 <- clv.Rand(std) jaccard1 <- clv.Jaccard(std) folk.mal1 <- clv.Folkes.Mallows(std) # 2. functional solution: # prepare set of functions which compare two clusterizations Rand <- function(clust1,clust2) clv.Rand(std.ext(clust1,clust2)) Jaccard <- function(clust1,clust2) clv.Jaccard(std.ext(clust1,clust2)) Folkes.Mallows <- function(clust1,clust2) clv.Folkes.Mallows(std.ext(clust1,clust2)) # compute indicies rand2 <- Rand(v.pred,v.real) jaccard2 <- Jaccard(v.pred,v.real) folk.mal2 <- Folkes.Mallows(v.pred,v.real) ```