# pairwiseRand: Compute pairwise Rand indices In bluster: Clustering Algorithms for Bioconductor

## Description

Breaks down the Rand index calculation to report values for each cluster and pair of clusters in a reference clustering compared to an alternative clustering.

## Usage

 `1` ```pairwiseRand(ref, alt, mode = c("ratio", "pairs", "index"), adjusted = TRUE) ```

## Arguments

 `ref` A character vector or factor containing one set of groupings, considered to be the reference. `alt` A character vector or factor containing another set of groupings, to be compared to `alt`. `mode` String indicating whether to return the ratio, the number of pairs or the Rand index. `adjusted` Logical scalar indicating whether the adjusted Rand index should be returned.

## Details

Recall that the Rand index calculation consists of four numbers:

a

The number of pairs of cells in the same cluster in `ref` and the same cluster in `alt`.

b

The number of pairs of cells in different clusters in `ref` and different clusters in `alt`.

c

The number of pairs of cells in the same cluster in `ref` and different clusters in `alt`.

d

The number of pairs of cells in different clusters in `ref` but the same cluster in `alt`.

The Rand index is then computed as a + b divided by a + b + c + d, i.e., the total number of pairs.

We can break these numbers down into values for each cluster or pair of clusters in `ref`. For each cluster, we compute its value of a, i.e., the number of pairs of cells in that cluster that are also in the same cluster in `alt`. Similarly, for each pair of clusters in `ref`, we compute its value of b, i.e., the number of pairs of cells that have one cell in each of those clusters and also belong in different clusters in `alt`.

This process provides more information about the specific similarities or differences between `ref` and `alt`, rather than coalescing all the values into a single statistic. For example, it is now possible to see which specific clusters from `ref` are not reproducible in `alt`, or which specific partitions between pairs of clusters are not reproducible. In the default output, such events can be diagnosed by looking for low entries in the ratio matrix; on the other hand, values close to 1 indicate that `ref` is almost perfectly recapitulated by `alt`.

If `adjusted=TRUE`, we adjust all counts by subtracting their expected values under a model of random permutations. This accounts for differences in the number and sizes of clusters within and between `ref` and `alt`, in a manner that mimics the calculation of adjusted Rand index (ARI). We subtract expectations on a per-cluster or per-cluster-pair basis for a and b, respectively; we also redefine the “total” number of cell pairs for each cluster or cluster pair based on the denominator of the ARI.

## Value

If `mode="ratio"`, a square numeric matrix is returned with number of rows equal to the number of unique levels in `ref`. Each diagonal entry is the ratio of the per-cluster a to the total number of pairs of cells in that cluster. Each off-diagonal entry is the ratio of the per-cluster-pair b to the total number of pairs of cells for that pair of clusters. Lower-triangular entries are set to `NA`. If `adjusted=TRUE`, counts and totals are both adjusted prior to computing the ratio.

If `mode="pairs"`, a list is returned containing `correct` and `total`, both of which are square numeric matrices of the same arrangement as described above. However, `correct` contains the actual numbers a (diagonal) and b (off-diagonal) rather than the ratios, while `total` contains the total number of cell pairs in each cluster or pair of clusters. If `adjusted=TRUE`, both matrices are adjusted by subtracting the random expectations from the counts.

If `mode="index"`, a numeric scalar is returned containing the Rand index (or ARI, if `adjusted=TRUE`).

## Author(s)

Aaron Lun

`pairwiseModularity`, which applies the same breakdown to the cluster modularity.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13``` ```m <- matrix(runif(10000), ncol=10) clust1 <- kmeans(m,3)\$cluster clust2 <- kmeans(m,5)\$cluster ratio <- pairwiseRand(clust1, clust2) ratio # Getting the raw counts: pairwiseRand(clust1, clust2, mode="pairs") # Computing the original Rand index. pairwiseRand(clust1, clust2, mode="index") ```