combineRecomputedResults: Combine SingleR results with recomputation

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/combineRecomputedResults.R

Description

Combine results from multiple runs of classifySingleR (usually against different references) into a single DataFrame. The label from the results with the highest score for each cell is retained. Unlike combineCommonResults, this does not assume that each run of classifySingleR was performed using the same set of common genes, instead recomputing the scores for comparison across references.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
combineRecomputedResults(
  results,
  test,
  trained,
  quantile = 0.8,
  assay.type.test = "logcounts",
  check.missing = TRUE,
  BNPARAM = KmknnParam(),
  BPPARAM = SerialParam()
)

Arguments

results

A list of DataFrame prediction results as returned by classifySingleR when run on each reference separately.

test

A numeric matrix of single-cell expression values where rows are genes and columns are cells. Alternatively, a SummarizedExperiment object containing such a matrix.

trained

A list of Lists containing the trained outputs of multiple references, equivalent to either (i) the output of trainSingleR on multiple references with recompute=TRUE, or (ii) running trainSingleR on each reference separately and manually making a list of the trained outputs.

quantile

Further arguments to pass to classifySingleR.

assay.type.test

An integer scalar or string specifying the assay of test containing the relevant expression matrix, if test is a SummarizedExperiment object.

check.missing

Logical scalar indicating whether rows should be checked for missing values (and if found, removed).

BNPARAM

A BiocNeighborParam object specifying the algorithm to use for building nearest neighbor indices.

BPPARAM

A BiocParallelParam object specifying how parallelization should be performed, if any.

Details

Here, the strategy is to performed classification separately within each reference, then collating the results to choose the label with the highest score across references. For a given cell in test, we extract its assigned label from results for each reference. We also retrieve the marker genes associated with that label and take the union of markers across all references. This defines a common feature space in which the score for each reference's assigned label is recomputed using ref; the label from the reference with the top recomputed score is then reported as the combined annotation for that cell.

Unlike combineCommonResults, the union of markers is not used for the within-reference calls. This avoids the inclusion of noise from irrelevant genes in the within-reference assignments. Obviously, combineRecomputedResults is slower as it does require recomputation of the scores, but the within-reference calls are faster as there are fewer genes in the union of markers for assigned labels (compared to the union of markers across all labels, as required by combineCommonResults), so it is likely that the net compute time should be lower.

It is strongly recommended that the universe of genes be the same across all references. The intersection of genes across all ref and test is used when recomputing scores, and differences in the availability of genes between references may have unpredictable effects.

Value

A DataFrame is returned containing the annotation statistics for each cell or cluster (row). This mimics the output of classifySingleR and contains the following fields:

It may also contain first.labels and pruned.labels if these were also present in results.

The metadata contains label.origin, a DataFrame specifying the reference of origin for each label in scores. Note that, unlike combineCommonResults, no common.genes is reported as this function does not use a common set of genes across all references.

Author(s)

Aaron Lun

References

Lun A, Bunis D, Andrews J (2020). Thoughts on a more scalable algorithm for multiple references. https://github.com/LTLA/SingleR/issues/94

See Also

SingleR and classifySingleR, for generating predictions to use in results.

combineCommonResults, for another approach to combining predictions.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Making up data.
ref <- .mockRefData(nreps=8)
ref1 <- ref[,1:2%%2==0]
ref2 <- ref[,1:2%%2==1]
ref2$label <- tolower(ref2$label)

test <- .mockTestData(ref)

# Performing classification within each reference.
test <- scuttle::logNormCounts(test)

ref1 <- scuttle::logNormCounts(ref1)
train1 <- trainSingleR(ref1, labels=ref1$label)
pred1 <- classifySingleR(test, train1)

ref2 <- scuttle::logNormCounts(ref2)
train2 <- trainSingleR(ref2, labels=ref2$label)
pred2 <- classifySingleR(test, train2)

# Combining results with recomputation of scores.
combined <- combineRecomputedResults(
    results=list(pred1, pred2), 
    test=test,
    trained=list(train1, train2))

combined[,1:5]

SingleR documentation built on Feb. 4, 2021, 2:01 a.m.