pairwiseRand: Compute pairwise Rand indices

View source: R/pairwiseRand.R

pairwiseRandR Documentation

Compute pairwise Rand indices

Description

Breaks down the Rand index calculation to report values for each cluster and pair of clusters in a reference clustering compared to an alternative clustering.

Usage

pairwiseRand(ref, alt, mode = c("ratio", "pairs", "index"), adjusted = TRUE)

Arguments

ref

A character vector or factor containing one set of groupings, considered to be the reference.

alt

A character vector or factor containing another set of groupings, to be compared to alt.

mode

String indicating whether to return the ratio, the number of pairs or the Rand index.

adjusted

Logical scalar indicating whether the adjusted Rand index should be returned.

Details

Recall that the Rand index calculation consists of four numbers:

a

The number of pairs of cells in the same cluster in ref and the same cluster in alt.

b

The number of pairs of cells in different clusters in ref and different clusters in alt.

c

The number of pairs of cells in the same cluster in ref and different clusters in alt.

d

The number of pairs of cells in different clusters in ref but the same cluster in alt.

The Rand index is then computed as a + b divided by a + b + c + d, i.e., the total number of pairs.

We can break these numbers down into values for each cluster or pair of clusters in ref. For each cluster, we compute its value of a, i.e., the number of pairs of cells in that cluster that are also in the same cluster in alt. Similarly, for each pair of clusters in ref, we compute its value of b, i.e., the number of pairs of cells that have one cell in each of those clusters and also belong in different clusters in alt.

This process provides more information about the specific similarities or differences between ref and alt, rather than coalescing all the values into a single statistic. For example, it is now possible to see which specific clusters from ref are not reproducible in alt, or which specific partitions between pairs of clusters are not reproducible. Such events can be diagnosed by looking for small (i.e., near-zero or negative) entries in the ratio matrix; on the other hand, large values (i.e., close to 1) indicate that ref is almost perfectly recapitulated by alt.

If adjusted=TRUE, we adjust all counts by subtracting their expected values under a model of random permutations. This accounts for differences in the number and sizes of clusters within and between ref and alt, in a manner that mimics the calculation of adjusted Rand index (ARI). We subtract expectations on a per-cluster or per-cluster-pair basis for a and b, respectively; we also redefine the “total” number of cell pairs for each cluster or cluster pair based on the denominator of the ARI.

Value

If mode="ratio", a square numeric matrix is returned with number of rows equal to the number of unique levels in ref. Each diagonal entry is the ratio of the per-cluster a to the total number of pairs of cells in that cluster. Each off-diagonal entry is the ratio of the per-cluster-pair b to the total number of pairs of cells for that pair of clusters. Lower-triangular entries are set to NA. If adjusted=TRUE, counts and totals are both adjusted prior to computing the ratio.

If mode="pairs", a list is returned containing correct and total, both of which are square numeric matrices of the same arrangement as described above. However, correct contains the actual numbers a (diagonal) and b (off-diagonal) rather than the ratios, while total contains the total number of cell pairs in each cluster or pair of clusters. If adjusted=TRUE, both matrices are adjusted by subtracting the random expectations from the counts.

If mode="index", a numeric scalar is returned containing the Rand index (or ARI, if adjusted=TRUE).

Author(s)

Aaron Lun

See Also

pairwiseModularity, which applies the same breakdown to the cluster modularity.

compareClusterings, which does this for multiple clusterings.

Examples

m <- matrix(runif(10000), ncol=10)

clust1 <- kmeans(m,3)$cluster
clust2 <- kmeans(m,5)$cluster

ratio <- pairwiseRand(clust1, clust2)
ratio

# Getting the raw counts:
pairwiseRand(clust1, clust2, mode="pairs")

# Computing the original Rand index.
pairwiseRand(clust1, clust2, mode="index")


LTLA/bluster documentation built on Sept. 8, 2024, 4:37 a.m.