matchReferences: Match labels from two references
In SingleR: Reference-Based Single-Cell RNA-Seq Annotation

Description Usage Arguments Details Value Author(s) See Also Examples

Match labels from a pair of references, corresponding to the same underlying cell type or state but with differences in nomenclature.

1	matchReferences(ref1, ref2, labels1, labels2, ...)

`ref1, ref2`	Numeric matrices of single-cell (usually log-transformed) expression values where rows are genes and columns are cells. Alternatively, SummarizedExperiment objects containing such matrices.
`labels1, labels2`	A character vector or factor of known labels for all cells in `ref1` and `ref2`, respectively.
`...`	Further arguments to pass to `SingleR`.

It is often the case that two references contain the same cell types for the same biological system, but the two sets of labels differ in their nomenclature. This makes it difficult to compare results from different references. It also interferes with attempts to combine multiple datasets to create a larger, more comprehensive reference.

The matchReferences function attempts to facilitate matching of labels across two reference datasets. It does so by using one of the references (say, ref1) to assign its labels to the other (ref2). For each label X in labels2, we compute the probability of assigning a sample of X to each label Y in labels1. We also use ref2 to assign labels to ref1, to obtain the probability of assigning a sample of Y to label X.

We then consider the probability of mutual assignment, i.e., assigning a sample of X to Y and a sample of Y to X. This is computed by simply taking the product of the two probabilities mentioned earlier. The output matrix contains mutual assignment probabilities for all pairs of X (rows) and Y (columns).

The mutual assignment probabilities are only high if there is a 1:1 mapping between labels. A perfect mapping manifests as probabilities of 1 in the relevant entries of the output matrix. Lower values are expected for ambiguous mappings and near-zero values for labels that are specific to one reference.

A numeric matrix containing a probability table of mutual assignment. Values close to 1 represent a 1:1 mapping between labels across the two references.

Aaron Lun

SingleR, to do the actual cross-assignment.

1
2
3

example(SingleR, echo=FALSE)
test$label <- paste0(test$label, "_X") # modifying the labels.
matchReferences(test, ref, labels1=test$label, labels2=ref$label)