pairwiseRand | R Documentation |
Breaks down the Rand index calculation to report values for each cluster and pair of clusters in a reference clustering compared to an alternative clustering.
pairwiseRand(ref, alt, mode = c("ratio", "pairs", "index"), adjusted = TRUE)
ref |
A character vector or factor containing one set of groupings, considered to be the reference. |
alt |
A character vector or factor containing another set of groupings, to be compared to |
mode |
String indicating whether to return the ratio, the number of pairs or the Rand index. |
adjusted |
Logical scalar indicating whether the adjusted Rand index should be returned. |
Recall that the Rand index calculation consists of four numbers:
a
The number of pairs of cells in the same cluster in ref
and the same cluster in alt
.
b
The number of pairs of cells in different clusters in ref
and different clusters in alt
.
c
The number of pairs of cells in the same cluster in ref
and different clusters in alt
.
d
The number of pairs of cells in different clusters in ref
but the same cluster in alt
.
The Rand index is then computed as a + b
divided by a + b + c + d
, i.e., the total number of pairs.
We can break these numbers down into values for each cluster or pair of clusters in ref
.
For each cluster, we compute its value of a
,
i.e., the number of pairs of cells in that cluster that are also in the same cluster in alt
.
Similarly, for each pair of clusters in ref
, we compute its value of b
,
i.e., the number of pairs of cells that have one cell in each of those clusters
and also belong in different clusters in alt
.
This process provides more information about the specific similarities or differences between ref
and alt
,
rather than coalescing all the values into a single statistic.
For example, it is now possible to see which specific clusters from ref
are not reproducible in alt
,
or which specific partitions between pairs of clusters are not reproducible.
Such events can be diagnosed by looking for small (i.e., near-zero or negative) entries in the ratio matrix;
on the other hand, large values (i.e., close to 1) indicate that ref
is almost perfectly recapitulated by alt
.
If adjusted=TRUE
, we adjust all counts by subtracting their expected values under a model of random permutations.
This accounts for differences in the number and sizes of clusters within and between ref
and alt
,
in a manner that mimics the calculation of adjusted Rand index (ARI).
We subtract expectations on a per-cluster or per-cluster-pair basis for a
and b
, respectively;
we also redefine the “total” number of cell pairs for each cluster or cluster pair based on the denominator of the ARI.
If mode="ratio"
, a square numeric matrix is returned with number of rows equal to the number of unique levels in ref
.
Each diagonal entry is the ratio of the per-cluster a
to the total number of pairs of cells in that cluster.
Each off-diagonal entry is the ratio of the per-cluster-pair b
to the total number of pairs of cells for that pair of clusters.
Lower-triangular entries are set to NA
.
If adjusted=TRUE
, counts and totals are both adjusted prior to computing the ratio.
If mode="pairs"
, a list is returned containing correct
and total
,
both of which are square numeric matrices of the same arrangement as described above.
However, correct
contains the actual numbers a
(diagonal) and b
(off-diagonal) rather than the ratios,
while total
contains the total number of cell pairs in each cluster or pair of clusters.
If adjusted=TRUE
, both matrices are adjusted by subtracting the random expectations from the counts.
If mode="index"
, a numeric scalar is returned containing the Rand index (or ARI, if adjusted=TRUE
).
Aaron Lun
pairwiseModularity
, which applies the same breakdown to the cluster modularity.
compareClusterings
, which does this for multiple clusterings.
m <- matrix(runif(10000), ncol=10)
clust1 <- kmeans(m,3)$cluster
clust2 <- kmeans(m,5)$cluster
ratio <- pairwiseRand(clust1, clust2)
ratio
# Getting the raw counts:
pairwiseRand(clust1, clust2, mode="pairs")
# Computing the original Rand index.
pairwiseRand(clust1, clust2, mode="index")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.