findMarkers: Find marker genes
In scran: Methods for Single-Cell RNA-Seq Data Analysis

Description Usage Arguments Details Value Author(s) See Also Examples

Find candidate marker genes for groups of cells (e.g., clusters) by testing for differential expression between pairs of groups.

findMarkers(x, ...)

## S4 method for signature 'ANY'
findMarkers(
  x,
  groups,
  test.type = c("t", "wilcox", "binom"),
  ...,
  pval.type = c("any", "some", "all"),
  min.prop = NULL,
  log.p = FALSE,
  full.stats = FALSE,
  sorted = TRUE,
  row.data = NULL,
  add.summary = FALSE,
  BPPARAM = SerialParam()
)

## S4 method for signature 'SummarizedExperiment'
findMarkers(x, ..., assay.type = "logcounts")

## S4 method for signature 'SingleCellExperiment'
findMarkers(x, groups = colLabels(x, onAbsence = "error"), ...)

`x`	A numeric matrix-like object of expression values, where each column corresponds to a cell and each row corresponds to an endogenous gene. This is expected to be normalized log-expression values for most tests - see Details. Alternatively, a SummarizedExperiment or SingleCellExperiment object containing such a matrix.
`...`	For the generic, further arguments to pass to specific methods. For the ANY method: For `test.type="t"`, further arguments to pass to `pairwiseTTests`. For `test.type="wilcox"`, further arguments to pass to `pairwiseWilcox`. For `test.type="binom"`, further arguments to pass to `pairwiseBinom`. Common arguments for all testing functions include `gene.names`, `direction`, `block` and `BPPARAM`. Test-specific arguments are also supported for the appropriate `test.type`. For the SummarizedExperiment method, further arguments to pass to the ANY method. For the SingleCellExperiment method, further arguments to pass to the SummarizedExperiment method.
`groups`	A vector of length equal to `ncol(x)`, specifying the group to which each cell is assigned. If `x` is a SingleCellExperiment, this defaults to `colLabels(x)` if available.
`test.type`	String specifying the type of pairwise test to perform - a t-test with `"t"`, a Wilcoxon rank sum test with `"wilcox"`, or a binomial test with `"binom"`.
`pval.type`	A string specifying how p-values are to be combined across pairwise comparisons for a given group/cluster.
`min.prop`	Numeric scalar specifying the minimum proportion of significant comparisons per gene, Defaults to 0.5 when `pval.type="some"`, otherwise defaults to zero.
`log.p`	A logical scalar indicating if log-transformed p-values/FDRs should be returned.
`full.stats`	A logical scalar indicating whether all statistics in `de.lists` should be stored in the output for each pairwise comparison.
`sorted`	Logical scalar indicating whether each output DataFrame should be sorted by a statistic relevant to `pval.type`.
`row.data`	A DataFrame containing additional row metadata for each gene in `x`, to be included in each of the output DataFrames. This should generally have row names identical to those of `x`. Alternatively, a list containing one such DataFrame per level of `groups`, where each DataFrame contains group-specific metadata for each gene to be included in the appropriate output DataFrame.
`add.summary`	Logical scalar indicating whether statistics from `summaryMarkerStats` should be added.
`BPPARAM`	A BiocParallelParam object indicating whether and how parallelization should be performed across genes.
`assay.type`	A string specifying which assay values to use, usually `"logcounts"`.

This function provides a convenience wrapper for marker gene identification between groups of cells, based on running pairwiseTTests or related functions and passing the result to combineMarkers. All of the arguments above are supplied directly to one of these two functions - refer to the relevant function's documentation for more details.

If x contains log-normalized expression values generated with a pseudo-count of 1, it can be used in any of the pairwise testing procedures. If x is scale-normalized but not log-transformed, it can be used with test.type="wilcox" and test.type="binom". If x contains raw counts, it can only be used with test.type="binom".

Note that log.p only affects the combined p-values and FDRs. If full.stats=TRUE, the p-values for each individual pairwise comparison will always be log-transformed, regardless of the value of log.p. Log-transformed p-values and FDRs are reported using the natural base.

The choice of pval.type determines whether the highly ranked genes are those that are DE between the current group and:

any other group ("any")
all other groups ("all")
some other groups ("some")

See ?combineMarkers for more details.

A named list of DataFrames, each of which contains a sorted marker gene list for the corresponding group. In each DataFrame, the top genes are chosen to enable separation of that group from all other groups. See ?combineMarkers for more details on the output format.

If row.data is provided, the additional fields are added to the front of the DataFrame for each cluster. If add.summary=TRUE, extra statistics for each cluster are also computed and added.

Any log-fold changes are reported as differences in average x between groups (usually in base 2, depending on the transformation applied to x).

Aaron Lun

pairwiseTTests, pairwiseWilcox, pairwiseBinom, for the underlying functions that compute the pairwise DE statistics.

combineMarkers, to combine pairwise statistics into a single marker list per cluster.

summaryMarkerStats, to incorporate additional summary statistics per cluster.

getMarkerEffects, to easily extract a matrix of effect sizes from each DataFrame.

library(scuttle)
sce <- mockSCE()
sce <- logNormCounts(sce)

# Any clustering method is okay, only using k-means for convenience.
kout <- kmeans(t(logcounts(sce)), centers=4) 

out <- findMarkers(sce, groups=kout$cluster)
names(out)
out[[1]]

# More customization of the tests:
out <- findMarkers(sce, groups=kout$cluster, test.type="wilcox")
out[[1]]

out <- findMarkers(sce, groups=kout$cluster, lfc=1, direction="up")
out[[1]]

out <- findMarkers(sce, groups=kout$cluster, pval.type="all")
out[[1]]