getClassicMarkers: Get classic markers

View source: R/getClassicMarkers.R

getClassicMarkersR Documentation

Get classic markers

Description

Find markers between pairs of labels using the “classic” approach, i.e., based on the log-fold change between the medians of labels.

Usage

getClassicMarkers(
  ref,
  labels,
  assay.type = "logcounts",
  check.missing = TRUE,
  de.n = NULL,
  num.threads = bpnworkers(BPPARAM),
  BPPARAM = SerialParam()
)

Arguments

ref

A numeric matrix of expression values where rows are genes and columns are reference samples (individual cells or bulk samples). Each row should be named with the gene name. In general, the expression values are expected to be log-transformed, see Details.

Alternatively, a SummarizedExperiment object containing such a matrix.

Alternatively, a list or List of SummarizedExperiment objects or numeric matrices containing multiple references, in which case the row names are expected to be the same across all objects.

labels

A character vector or factor of known labels for all samples in ref.

Alternatively, if ref is a list, labels should be a list of the same length. Each element should contain a character vector or factor specifying the label for the corresponding entry of ref.

assay.type

An integer scalar or string specifying the assay of ref containing the relevant expression matrix, if ref is a SummarizedExperiment object (or is a list that contains one or more such objects).

check.missing

Logical scalar indicating whether rows should be checked for missing values (and if found, removed).

de.n

An integer scalar specifying the number of DE genes to use. Defaults to 500 * (2/3) ^ log2(N) where N is the number of unique labels.

num.threads

Integer scalar specifying the number of threads to use.

BPPARAM

A BiocParallelParam object specifying how parallelization should be performed.

Details

This function implements the classic mode of marker detection in SingleR, based only on the magnitude of the log-fold change between labels. In many respects, this approach may be suboptimal as it does not consider the variance within each label and has limited precision when the expression values are highly discrete. Nonetheless, it is often the only possible approach when dealing with reference datasets that lack replication and thus cannot be used with more advanced marker detection methods.

If multiple references are supplied, ranking is performed based on the average of the log-fold changes within each reference. This avoids comparison of expression values across references that can be distorted by batch effects. If a pair of labels does not co-occur in at least one reference, no attempt is made to perform the comparison and the corresponding character vector is left empty in the output.

The character vector corresponding to the comparison of a label to itself is always empty.

Value

A list of lists of character vectors, where both the outer and inner lists have names equal to the unique levels of labels. The character vector contains the names of the top de.n genes with the largest positive log-fold changes in one label (entry of the outer list) against another label (entry of the inner list).

Author(s)

Aaron Lun, based on the original SingleR code by Dvir Aran.

See Also

trainSingleR and SingleR, where this function is used when genes="de" and de.method="classic".

Examples

ref <- .mockRefData()
ref <- scuttle::logNormCounts(ref)
out <- getClassicMarkers(ref, labels=ref$label)
str(out)

# Works with multiple references:
ref2 <- .mockRefData()
ref2 <- scuttle::logNormCounts(ref2)
out2 <- getClassicMarkers(list(ref, ref2), labels=list(ref$label, ref2$label))
str(out2)


LTLA/SingleR documentation built on July 30, 2022, 4:11 a.m.