getClassicMarkers: Get classic markers
In LTLA/SingleR: Reference-Based Single-Cell RNA-Seq Annotation

getClassicMarkers

R Documentation

Get classic markers

Description

Find markers between pairs of labels using the “classic” approach, i.e., based on the log-fold change between the medians of labels.

Usage

getClassicMarkers(
  ref,
  labels,
  assay.type = "logcounts",
  check.missing = TRUE,
  de.n = NULL,
  num.threads = bpnworkers(BPPARAM),
  BPPARAM = SerialParam()
)

Arguments

`ref`	A numeric matrix of expression values where rows are genes and columns are reference samples (individual cells or bulk samples). Each row should be named with the gene name. In general, the expression values are expected to be normalized and log-transformed, see Details. Alternatively, a SummarizedExperiment object containing such a matrix. Alternatively, a list or List of SummarizedExperiment objects or numeric matrices containing multiple references.
`labels`	A character vector or factor of known labels for all samples in `ref`. Alternatively, if `ref` is a list, `labels` should be a list of the same length. Each element should contain a character vector or factor specifying the labels for the columns of the corresponding element of `ref`.
`assay.type`	An integer scalar or string specifying the assay of `ref` containing the relevant expression matrix, if `ref` is a SummarizedExperiment object (or is a list that contains one or more such objects).
`check.missing`	Logical scalar indicating whether rows should be checked for missing values. If true and any missing values are found, the rows containing these values are silently removed.
`de.n`	An integer scalar specifying the number of DE genes to use. Defaults to `500 * (2/3) ^ log2(N)` where `N` is the number of unique labels.
`num.threads`	Integer scalar specifying the number of threads to use.
`BPPARAM`	A BiocParallelParam object specifying how parallelization should be performed.

Details

This function implements the classic mode of marker detection in SingleR, based only on the magnitude of the log-fold change between labels. In many respects, this approach may be suboptimal as it does not consider the variance within each label and has limited precision when the expression values are highly discrete. Nonetheless, it is often the only possible approach when dealing with reference datasets that lack replication and thus cannot be used with more advanced marker detection methods.

If multiple references are supplied, ranking is performed based on the average of the log-fold changes within each reference. This avoids comparison of expression values across references that can be distorted by batch effects. If a pair of labels does not co-occur in at least one reference, no attempt is made to perform the comparison and the corresponding character vector is left empty in the output.

The character vector corresponding to the comparison of a label to itself is always empty.

Value

A list of lists of character vectors, where both the outer and inner lists have names equal to the unique levels of labels. The character vector contains the names of the top de.n genes with the largest positive log-fold changes in one label (entry of the outer list) against another label (entry of the inner list).

Author(s)

Aaron Lun, based on the original SingleR code by Dvir Aran.

Examples

ref <- .mockRefData()
ref <- scuttle::logNormCounts(ref)
out <- getClassicMarkers(ref, labels=ref$label)
str(out)

# Works with multiple references:
ref2 <- .mockRefData()
ref2 <- scuttle::logNormCounts(ref2)
out2 <- getClassicMarkers(list(ref, ref2), labels=list(ref$label, ref2$label))
str(out2)

LTLA/SingleR documentation built on June 15, 2025, 4:13 a.m.