std.ext | R Documentation |
Group of functions which compute standard external measures such as: Rand statistic and Folkes and Mallows index, Jaccard coefficient etc.
std.ext(clust1, clust2)
clv.Rand(external.ind)
clv.Jaccard(external.ind)
clv.Folkes.Mallows(external.ind)
clv.Phi(external.ind)
clv.Russel.Rao(external.ind)
clust1 |
integer |
clust2 |
integer |
external.ind |
|
Two input vectors
keep information about two different partitionings (let say P and P')
of the same data set X. We refer to a pair of points (xi, xj) (we assume that i != j) from the
data set using the following terms:
SS | - number of pairs where both points belongs to the same cluster in both partitionings, |
SD | - number of pairs where both points belongs to the same cluster in partitioning P but in P' do not, |
DS | - number of pairs where in partitioning P both point belongs to different clusters but in P' do not, |
DD | - number of pairs where both objects belongs to different clusters in both partitionings. |
Those values are used to compute (M = SS + SD + DS +DD):
Rand statistic | R = (SS + DD)/M |
Jaccard coefficient | J = SS/(SS + SD + DS) |
Folkes and Mallows index | FM = sqrt(SS/(SS + SD))*sqrt(SS/(SS + DS)) |
Russel and Rao index | RR = SS/M |
Phi index | Ph = (SS*DD - SD*DS)/((SS+SD)(SS+DS)(SD+DD)(DS+DD)). |
std.ext returns a list containing four values: SS, SD, DS, DD. |
clv.Rand returns R value. |
clv.Jaccard returns J value. |
clv.Folkes.Mallows returns FM value. |
clv.Phi returns Ph value. |
clv.Russel.Rao returns RR value.
|
Lukasz Nieweglowski
G. Saporta and G. Youness Comparing two partitions: Some Proposals and Experiments. http://cedric.cnam.fr/PUBLIS/RC405.pdf
Other measures created to compare two partitionings:
dot.product
, similarity.index
# load and prepare data
library(clv)
data(iris)
iris.data <- iris[,1:4]
# cluster data
pam.mod <- pam(iris.data,3) # create three clusters
v.pred <- as.integer(pam.mod$clustering) # get cluster ids associated to given data objects
v.real <- as.integer(iris$Species) # get also real cluster ids
# compare true clustering with those given by the algorithm
# 1. optimal solution:
# use only once std.ext function
std <- std.ext(v.pred, v.real)
# to compute three indicies based on std.ext result
rand1 <- clv.Rand(std)
jaccard1 <- clv.Jaccard(std)
folk.mal1 <- clv.Folkes.Mallows(std)
# 2. functional solution:
# prepare set of functions which compare two clusterizations
Rand <- function(clust1,clust2) clv.Rand(std.ext(clust1,clust2))
Jaccard <- function(clust1,clust2) clv.Jaccard(std.ext(clust1,clust2))
Folkes.Mallows <- function(clust1,clust2) clv.Folkes.Mallows(std.ext(clust1,clust2))
# compute indicies
rand2 <- Rand(v.pred,v.real)
jaccard2 <- Jaccard(v.pred,v.real)
folk.mal2 <- Folkes.Mallows(v.pred,v.real)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.