cluster.stat: Calculate measure of quality of inferred clusters

View source: R/family.R

cluster.statR Documentation

Calculate measure of quality of inferred clusters

Description

Calculate a score indicating how well two sets of clusters conform.

Usage

cluster.stat(fam1,fam2,method=c("all","rand","adj","fm","kb"))

Arguments

fam1

A list of clusters; each component in the list is one family, containing the indices of the individuals in that family.

fam2

A list, just like fam1.

method

A character string indicating whether to calculate the Rand index, the adjusted Rand index, the Fowlkes and Mallows B index, or Karl Broman's index. If method=all, a vector with all four indices is returned.

Details

In the Rand index (Rand 1971), one considers all pairs of individuals, and assigns a 1 to a pair if the individuals are either in the same cluster in both fam1 and fam2 or are not in the same cluster in both fam1 and fam2, and assigns a 0 to the pair otherwise, and then takes the sum of these, divided by the number of pairs of individuals.

Karl Broman's index (which we don't recommend, but we implement here in order to allow comparisons to be made) is just like the Rand index, but fam2 is assumed to be the true partition, and the set of all pairs in the same group (by fam2) and the set of all pairs in different groups (by fam2), are given equal weight.

Let n_{ij} be the number of individuals in group i by partition 1 and group j by partition 2. Let n_{i.} = \sum_{j} n_{ij} and define n_{.j} similarly.

In the adjusted-Rand index (Hubert and Arabie 1985), ...

In the Fowlkes and Mallows B index (Fowlkes and Mallows 1983), ...

Value

The value of a score for comparing two sets of clusters.

Author(s)

Karl W Broman broman@wisc.edu

References

WM Rand (1971) Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66:846-850.

L Hubert and P Arabie (1985) Comparing partitions. Journal of Classification. 2:193-218.

EB Fowlkes and CL Mallows (1983) A method for comparing two hierarchical clusterings. Journal of the American Statistical Association 78:553-584.

BS Everitt, S Landau and M Leese (2001) Cluster analysis, 4th edition. Arnold, London, pp. 181-3.

See Also

fingers, true.fams

Examples

data(aedes)
f <- freq(aedes)
co <- cutoff(f)
d <- calc.dist(aedes)
fam <- fingers(d,co,make.plot=TRUE)
tf <- true.fams(aedes)
cluster.stat(fam,tf)
cluster.stat(fam,tf,method="fm")

kbroman/fingers documentation built on May 17, 2023, 11:50 p.m.