Description Usage Arguments Details Value Author(s) See Also Examples

Breaks down the Rand index calculation to report values for each cluster and pair of clusters in a reference clustering compared to an alternative clustering.

1 | ```
pairwiseRand(ref, alt, mode = c("ratio", "pairs", "index"), adjusted = TRUE)
``` |

`ref` |
A character vector or factor containing one set of groupings, considered to be the reference. |

`alt` |
A character vector or factor containing another set of groupings, to be compared to |

`mode` |
String indicating whether to return the ratio, the number of pairs or the Rand index. |

`adjusted` |
Logical scalar indicating whether the adjusted Rand index should be returned. |

Recall that the Rand index calculation consists of four numbers:

*a*The number of pairs of cells in the same cluster in

`ref`

and the same cluster in`alt`

.*b*The number of pairs of cells in different clusters in

`ref`

and different clusters in`alt`

.*c*The number of pairs of cells in the same cluster in

`ref`

and different clusters in`alt`

.*d*The number of pairs of cells in different clusters in

`ref`

but the same cluster in`alt`

.

The Rand index is then computed as *a + b* divided by *a + b + c + d*, i.e., the total number of pairs.

We can break these numbers down into values for each cluster or pair of clusters in `ref`

.
For each cluster, we compute its value of *a*,
i.e., the number of pairs of cells in *that* cluster that are also in the same cluster in `alt`

.
Similarly, for each pair of clusters in `ref`

, we compute its value of *b*,
i.e., the number of pairs of cells that have one cell in each of those clusters
and also belong in different clusters in `alt`

.

This process provides more information about the specific similarities or differences between `ref`

and `alt`

,
rather than coalescing all the values into a single statistic.
For example, it is now possible to see which specific clusters from `ref`

are not reproducible in `alt`

,
or which specific partitions between pairs of clusters are not reproducible.
In the default output, such events can be diagnosed by looking for low entries in the ratio matrix;
on the other hand, values close to 1 indicate that `ref`

is almost perfectly recapitulated by `alt`

.

If `adjusted=TRUE`

, we adjust all counts by subtracting their expected values under a model of random permutations.
This accounts for differences in the number and sizes of clusters within and between `ref`

and `alt`

,
in a manner that mimics the calculation of adjusted Rand index (ARI).
We subtract expectations on a per-cluster or per-cluster-pair basis for *a* and *b*, respectively;
we also redefine the “total” number of cell pairs for each cluster or cluster pair based on the denominator of the ARI.

If `mode="ratio"`

, a square numeric matrix is returned with number of rows equal to the number of unique levels in `ref`

.
Each diagonal entry is the ratio of the per-cluster *a* to the total number of pairs of cells in that cluster.
Each off-diagonal entry is the ratio of the per-cluster-pair *b* to the total number of pairs of cells for that pair of clusters.
Lower-triangular entries are set to `NA`

.
If `adjusted=TRUE`

, counts and totals are both adjusted prior to computing the ratio.

If `mode="pairs"`

, a list is returned containing `correct`

and `total`

,
both of which are square numeric matrices of the same arrangement as described above.
However, `correct`

contains the actual numbers *a* (diagonal) and *b* (off-diagonal) rather than the ratios,
while `total`

contains the total number of cell pairs in each cluster or pair of clusters.
If `adjusted=TRUE`

, both matrices are adjusted by subtracting the random expectations from the counts.

If `mode="index"`

, a numeric scalar is returned containing the Rand index (or ARI, if `adjusted=TRUE`

).

Aaron Lun

`pairwiseModularity`

, which applies the same breakdown to the cluster modularity.

1 2 3 4 5 6 7 8 9 10 11 12 13 | ```
m <- matrix(runif(10000), ncol=10)
clust1 <- kmeans(m,3)$cluster
clust2 <- kmeans(m,5)$cluster
ratio <- pairwiseRand(clust1, clust2)
ratio
# Getting the raw counts:
pairwiseRand(clust1, clust2, mode="pairs")
# Computing the original Rand index.
pairwiseRand(clust1, clust2, mode="index")
``` |

