Description Usage Arguments Details Value Author(s) See Also Examples
View source: R/classifySingleR.R
Assign labels to each cell in a test dataset, using a pre-trained classifier combined with an iterative fine-tuning approach.
1 2 3 4 5 6 7 8 9 10 11 12 |
test |
A numeric matrix of single-cell expression values where rows are genes and columns are cells. Alternatively, a SummarizedExperiment object containing such a matrix. |
trained |
A List containing the output of the |
quantile |
A numeric scalar specifying the quantile of the correlation distribution to use to compute the score for each label. |
fine.tune |
A logical scalar indicating whether fine-tuning should be performed. |
tune.thresh |
A numeric scalar specifying the maximum difference from the maximum correlation to use in fine-tuning. |
sd.thresh |
A numeric scalar specifying the threshold on the standard deviation, for use in gene selection during fine-tuning.
This is only used if |
prune |
A logical scalar indicating whether label pruning should be performed. |
assay.type |
Integer scalar or string specifying the matrix of expression values to use if |
check.missing |
Logical scalar indicating whether rows should be checked for missing values (and if found, removed). |
BPPARAM |
A BiocParallelParam object specifyign the parallelization scheme to use. |
Consider each cell in the test set test
and each label in the training set.
We compute Spearman's rank correlations between the test cell and all cells in the training set with the given label,
based on the expression profiles of the genes selected by trained
.
The score is defined as the quantile of the distribution of correlations, as specified by quantile
.
(Technically, we avoid explicitly computing all correlations by using a nearest neighbor search, but the resulting score is the same.)
After repeating this across all labels, the label with the highest score is used as the prediction for that cell.
If fine.tune=TRUE
, an additional fine-tuning step is performed for each cell to improve resolution.
We identify all labels with scores that are no more than tune.thresh
below the maximum score.
These labels are used to identify a fresh set of marker genes, and the calculation of the score is repeated using only these genes.
The aim is to refine the choice of markers and reduce noise when distinguishing between closely related labels.
The best and next-best scores are reported in the output for use in diagnostics, e.g., pruneScores
.
The default assay.type
is set to "logcounts"
simply for consistency with trainSingleR
.
In practice, the raw counts (for UMI data) or the transcript counts (for read count data) can also be used without normalization and log-transformation.
Any monotonic transformation will have no effect the calculation of the correlation values other than for some minor differences due to numerical precision.
If prune=TRUE
, label pruning is performed as described in pruneScores
with default arguments.
This aims to remove low-quality labels that are ambiguous or correspond to misassigned cells.
However, the default settings can be somewhat aggressive and discard otherwise useful labels in some cases - see ?pruneScores
for details.
If trained
was generated from multiple references, the per-reference statistics are combined into a single DataFrame of results.
This is done using combineRecomputedResults
if recompute=TRUE
in trainSingleR
,
otherwise it is done using combineCommonResults
.
A DataFrame where each row corresponds to a cell in test
.
In the case of a single reference, this contains:
scores
, a numeric matrix of correlations at the specified quantile
for each label (column) in each cell (row).
This will contain NA
s if multiple references were supplied to trainSingleR
with recompute=TRUE
.
first.labels
, a character vector containing the predicted label before fine-tuning.
Only added if fine.tune=TRUE
.
tuned.scores
, a DataFrame containing first
and second
.
These are numeric vectors containing the best and next-best scores at the final round of fine-tuning for each cell.
Only added if fine.tune=TRUE
.
labels
, a character vector containing the predicted label based on the maximum entry in scores
.
pruned.labels
, a character vector containing the pruned labels where “low-quality”.
els are replaced with NA
s.
Only added if prune=TRUE
.
The metadata
of the DataFrame contains:
common.genes
, a character vector of genes used to compute the correlations prior to fine-tuning.
de.genes
, a list of list of genes used to distinguish between each pair of labels.
Only returned if genes="de"
when constructing trained
, see ?trainSingleR
for more details.
In the case of multiple references, the output of combineCommonResults
or combineRecomputedResults
is returned,
depending on whether recompute=TRUE
when constructing trained
.
This is a DataFrame containing:
scores
, a numeric matrix of scores for each cell (row) across all labels in all references (columns).
This will contain NA
s if recomputation is performed.
labels
, first.labels
(if fine.tune=TRUE
) and pruned.labels
(if prune=TRUE
),
containing the consolidated labels of varying flavors as described above.
orig.results
, a DataFrame of DataFrames containing
the results of running classifySingleR
against each individual reference.
Each nested DataFrame has the same format as described above.
See ?combineCommonResults
and ?combineRecomputedResults
for more details.
Aaron Lun, based on the original SingleR
code by Dvir Aran.
trainSingleR
, to prepare the training set for classification.
pruneScores
, to remove low-quality labels based on the scores.
combineCommonResults
, to combine results from multiple references.
1 2 3 4 5 6 7 8 9 10 11 12 13 | # Mocking up data with log-normalized expression values:
ref <- .mockRefData()
test <- .mockTestData(ref)
ref <- scuttle::logNormCounts(ref)
test <- scuttle::logNormCounts(test)
# Setting up the training:
trained <- trainSingleR(ref, label=ref$label)
# Performing the classification:
pred <- classifySingleR(test, trained)
table(predicted=pred$labels, truth=test$label)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.