summaryMarkerStats: Summary marker statistics

summaryMarkerStatsR Documentation

Summary marker statistics

Description

Compute additional gene-level statistics for each group to assist in identifying marker genes, to complement the formal test statistics generated by findMarkers.

Usage

summaryMarkerStats(x, ...)

## S4 method for signature 'ANY'
summaryMarkerStats(
  x,
  groups,
  row.data = NULL,
  average = "mean",
  BPPARAM = SerialParam()
)

## S4 method for signature 'SummarizedExperiment'
summaryMarkerStats(x, ..., assay.type = "logcounts")

Arguments

x

A numeric matrix-like object of expression values, where each column corresponds to a cell and each row corresponds to an endogenous gene. This is generally expected to be normalized log-expression values unless one knows better.

Alternatively, a SummarizedExperiment or SingleCellExperiment object containing such a matrix.

...

For the generic, further arguments to pass to specific methods.

For the SummarizedExperiment method, further arguments to pass to the ANY method.

groups

A vector of length equal to ncol(x), specifying the group to which each cell is assigned. If x is a SingleCellExperiment, this defaults to colLabels(x) if available.

row.data

A DataFrame containing additional row metadata for each gene in x, to be included in each of the output DataFrames. This should generally have row names identical to those of x.

Alternatively, a list containing one such DataFrame per level of groups, where each DataFrame contains group-specific metadata for each gene to be included in the appropriate output DataFrame.

average

String specifying the type of average, to be passed to sumCountsAcrossCells.

BPPARAM

A BiocParallelParam object indicating whether and how parallelization should be performed across genes.

assay.type

A string specifying which assay values to use, usually "logcounts".

Details

This function only generates descriptive statistics for each gene to assist marker selection. It does not consider blocking factors or covariates that would otherwise be available from comparisons between groups. For the sake of brevity, statistics for the “other” groups are summarized into a single value.

Value

A named List of DataFrames, with one entry per level of groups. Each DataFrame has number of rows corresponding to the rows in x and contains the fields:

  • self.average, the average (log-)expression across all cells in the current group.

  • other.average, the grand average of the average (log-)expression across cells in the other groups.

  • self.detected, the proportion of cells with detected expression in the current group.

  • other.detected, the average proportion of cells with detected expression in the other groups.

Author(s)

Aaron Lun

See Also

findMarkers, where the output of this function can be used in row.data=.

Examples

library(scuttle)
sce <- mockSCE()
sce <- logNormCounts(sce)

# Any clustering method is okay.
kout <- kmeans(t(logcounts(sce)), centers=3)
sum.out <- summaryMarkerStats(sce, kout$cluster)
sum.out[["1"]]

# Add extra rowData if you like.
rd <- DataFrame(Symbol=sample(LETTERS, nrow(sce), replace=TRUE), 
    row.names=rownames(sce))
sum.out <- summaryMarkerStats(sce, kout$cluster, row.data=rd)
sum.out[["1"]]


MarioniLab/scran documentation built on March 7, 2024, 1:45 p.m.