multiMarkerStats: Combine multiple sets of marker statistics
In MarioniLab/scran: Methods for Single-Cell RNA-Seq Data Analysis

multiMarkerStats

R Documentation

Combine multiple sets of marker statistics

Description

Combine multiple sets of marker statistics, typically from different tests, into a single DataFrame for convenient inspection.

Usage

multiMarkerStats(..., repeated = NULL, sorted = TRUE)

Arguments

...

Two or more lists or Lists produced by findMarkers or combineMarkers. Each list should contain DataFrames of results, one for each group/cluster of cells.

The names of each List should be the same; the universe of genes in each DataFrame should be the same; and the same number of columns in each DataFrame should be named. All elements in ... are also expected to be named.

repeated

Character vector of columns that are present in one or more DataFrames but should only be reported once. Typically used to avoid reporting redundant copies of annotation-related columns.

sorted

Logical scalar indicating whether each output DataFrame should be sorted by some relevant statistic.

Details

The combined statistics are designed to favor a gene that is highly ranked in each of the individual test results. This is highly conservative and aims to identify robust DE that is significant under all testing schemes.

A combined Top value of T indicates that the gene is among the top T genes of one or more pairwise comparisons in each of the testing schemes. (We can be even more aggressive if the individual results were generated with a larger min.prop value.) In effect, a gene can only achieve a low Top value if it is consistently highly ranked in each test. If sorted=TRUE, this is used to order the genes in the output DataFrame.

The combined p.value is effectively the result of applying an intersection-union test to the per-test results. This will only be low if the gene has a low p-value in each of the test results. If sorted=TRUE and Top is not present, this will be used to order the genes in the output DataFrame.

Value

A named List of DataFrames with one DataFrame per group/cluster. Each DataFrame contains statistics from the corresponding entry of each List in ..., prefixed with the name of the List. In addition, several combined statistics are reported:

Top, the largest rank of each gene across all DataFrames for that group. This is only reported if each list in ... was generated with pval.type="any" in combineMarkers.
p.value, the largest p-value of each gene across all DataFrames for that group. This is replaced by log.p.value if p-values in ... are log-transformed.
FDR, the BH-adjusted value of p.value. This is replaced by log.FDR if p-values in ... are log-transformed.

Author(s)

Aaron Lun

Examples

library(scuttle)
sce <- mockSCE()
sce <- logNormCounts(sce)

# Any clustering method is okay, only using k-means for convenience.
kout <- kmeans(t(logcounts(sce)), centers=4) 

tout <- findMarkers(sce, groups=kout$cluster, direction="up")
wout <- findMarkers(sce, groups=kout$cluster, direction="up", test="wilcox")

combined <- multiMarkerStats(t=tout, wilcox=wout)
colnames(combined[[1]])

MarioniLab/scran documentation built on Sept. 7, 2024, 6:25 a.m.