SingleR: Annotate scRNA-seq data
In LTLA/SingleR: Reference-Based Single-Cell RNA-Seq Annotation

SingleR

R Documentation

Annotate scRNA-seq data

Description

Returns the best annotation for each cell in a test dataset, given a labelled reference dataset in the same feature space.

Usage

SingleR(
  test,
  ref,
  labels,
  method = NULL,
  clusters = NULL,
  genes = "de",
  sd.thresh = 1,
  de.method = "classic",
  de.n = NULL,
  de.args = list(),
  aggr.ref = FALSE,
  aggr.args = list(),
  recompute = TRUE,
  restrict = NULL,
  quantile = 0.8,
  fine.tune = TRUE,
  tune.thresh = 0.05,
  fine.tune.combined = fine.tune,
  prune = TRUE,
  assay.type.test = "logcounts",
  assay.type.ref = "logcounts",
  check.missing.test = FALSE,
  check.missing.ref = check.missing,
  check.missing = TRUE,
  num.threads = bpnworkers(BPPARAM),
  BNPARAM = NULL,
  BPPARAM = SerialParam()
)

Arguments

`test`	A numeric matrix of single-cell expression values where rows are genes and columns are cells. Alternatively, a SummarizedExperiment object containing such a matrix.
`ref`	A numeric matrix of (usually normalized and log-transformed) expression values from a reference dataset, or a SummarizedExperiment object containing such a matrix; see `trainSingleR` for details. Alternatively, a list or List of SummarizedExperiment objects or numeric matrices containing multiple references. Row names may be different across entries but only the intersection will be used, see Details.
`labels`	A character vector or factor of known labels for all samples in `ref`. Alternatively, if `ref` is a list, `labels` should be a list of the same length. Each element should contain a character vector or factor specifying the labels for the columns of the corresponding element of `ref`.
`method`	Deprecated.
`clusters`	A character vector or factor of cluster identities for each cell in `test`. If set, annotation is performed on the aggregated cluster profiles, otherwise it defaults to per-cell annotation.
`genes`, `sd.thresh`, `de.method`, `de.n`, `de.args`	Arguments controlling the choice of marker genes used for annotation, see `trainSingleR`.
`aggr.ref`, `aggr.args`	Arguments controlling the aggregation of the references prior to annotation, see `trainSingleR`.
`recompute`	Deprecated and ignored.
`restrict`	A character vector of gene names to use for marker selection. By default, all genes in `ref` are used.
`quantile`, `fine.tune`, `tune.thresh`, `fine.tune.combined`, `prune`	Further arguments to pass to `classifySingleR`.
`assay.type.test`	An integer scalar or string specifying the assay of `test` containing the relevant expression matrix, if `test` is a SummarizedExperiment object.
`assay.type.ref`	An integer scalar or string specifying the assay of `ref` containing the relevant expression matrix, if `ref` is a SummarizedExperiment object (or is a list that contains one or more such objects).
`check.missing.test`	Logical scalar indicating whether rows of `test` should be checked for missing values (and if found, removed).
`check.missing.ref`	Logical scalar indicating whether rows of `ref` should be checked for missing values (and if found, removed).
`check.missing`	Deprecated, use `check.missing.test` and `check.missing.ref` instead.
`num.threads`	Integer scalar specifying the number of threads to use for index building and classification.
`BNPARAM`	Deprecated and ignored.
`BPPARAM`	A BiocParallelParam object specifying how parallelization should be performed in other steps, see `?trainSingleR` and `?classifySingleR` for more details.

Details

This function is just a convenient wrapper around trainSingleR and classifySingleR. The function will automatically restrict the analysis to the intersection of the genes in both ref and test. If this intersection is empty (e.g., because the two datasets use different gene annotations), an error will be raised.

If clusters is specified, per-cell profiles are summed to obtain per-cluster profiles. Annotation is then performed by running classifySingleR on these profiles. This yields a DataFrame with one row per level of clusters.

The default settings of this function are based on the assumption that ref contains or bulk data. If it contains single-cell data, this usually requires a different de.method choice. Read the Note in ?trainSingleR for more details.

Value

A DataFrame is returned containing the annotation statistics for each cell (one cell per row). This is identical to the output of classifySingleR.

Author(s)

Aaron Lun, based on code by Dvir Aran.

References

Aran D, Looney AP, Liu L et al. (2019). Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat. Immunology 20, 163–172.

Examples

# Mocking up data with log-normalized expression values:
ref <- .mockRefData()
test <- .mockTestData(ref)

ref <- scuttle::logNormCounts(ref)
test <- scuttle::logNormCounts(test)

# Running the classification with different options:
pred <- SingleR(test, ref, labels=ref$label)
table(predicted=pred$labels, truth=test$label)

k.out<- kmeans(t(assay(test, "logcounts")), center=5) # mock up a clustering
pred2 <- SingleR(test, ref, labels=ref$label, clusters=k.out$cluster) 
table(predicted=pred2$labels, cluster=rownames(pred2))

LTLA/SingleR documentation built on June 15, 2025, 4:13 a.m.