summarizeAssayByGroup: Summarize an assay by group

summarizeAssayByGroupR Documentation

Summarize an assay by group

Description

From an assay matrix, compute summary statistics for groups of cells. A typical example would be to compute various summary statistics for clusters.

Usage

summarizeAssayByGroup(x, ...)

## S4 method for signature 'ANY'
summarizeAssayByGroup(
  x,
  ids,
  subset.row = NULL,
  subset.col = NULL,
  statistics = c("mean", "sum", "num.detected", "prop.detected", "median"),
  store.number = "ncells",
  threshold = 0,
  BPPARAM = SerialParam()
)

## S4 method for signature 'SummarizedExperiment'
summarizeAssayByGroup(x, ..., assay.type = "counts")

Arguments

x

A numeric matrix containing features in rows and cells in columns. Alternatively, a SummarizedExperiment object containing such a matrix.

...

For the generics, further arguments to be passed to specific methods.

For the SummarizedExperiment method, further arguments to be passed to the ANY method.

ids

A factor (or vector coercible into a factor) specifying the group to which each cell in x belongs. Alternatively, a DataFrame of such vectors or factors, in which case each unique combination of levels defines a group.

subset.row

An integer, logical or character vector specifying the features to use. If NULL, defaults to all features.

subset.col

An integer, logical or character vector specifying the cells to use. Defaults to all cells with non-NA entries of ids.

statistics

Character vector specifying the type of statistics to be computed, see Details.

store.number

String specifying the field of the output colData to store the number of cells in each group. If NULL, nothing is stored.

threshold

A numeric scalar specifying the threshold above which a gene is considered to be detected.

BPPARAM

A BiocParallelParam object specifying whether summation should be parallelized.

assay.type

A string or integer scalar specifying the assay of x containing the assay to be summarized.

Details

These functions provide a convenient method for summing or averaging expression values across multiple columns for each feature. A typical application would be to sum counts across all cells in each cluster to obtain “pseudo-bulk” samples for further analyses, e.g., differential expression analyses between conditions.

For each feature, the chosen assay can be aggregated by:

  • "sum", the sum of all values in each group. This makes the most sense for raw counts, to allow models to account for the mean-variance relationship.

  • "mean", the mean of all values in each group. This makes the most sense for normalized and/or transformed assays.

  • "median", the median of all values in each group. This makes the most sense for normalized and/or transformed assays, usually generated from large counts where discreteness is less of an issue.

  • "num.detected" and "prop.detected", the number and proportion of values in each group that are non-zero. This makes the most sense for raw counts or sparsity-preserving transformations.

Any NA values in ids are implicitly ignored and will not be considered during summation. This may be useful for removing undesirable cells by setting their entries in ids to NA. Alternatively, we can explicitly select the cells of interest with subset_col.

If ids is a factor and contains unused levels, they will not be represented as columns in the output.

Value

A SummarizedExperiment is returned with one column per level of ids. Each entry of the assay contains the sum or average across all cells in a given group (column) for a given feature (row). Columns are ordered by levels(ids) and the number of cells per level is reported in the "ncells" column metadata. For DataFrame ids, each column corresponds to a unique combination of levels (recorded in the colData).

Author(s)

Aaron Lun

See Also

aggregateAcrossCells, which also combines information in the colData of x.

Examples

example_sce <- mockSCE()
ids <- sample(LETTERS[1:5], ncol(example_sce), replace=TRUE)

out <- summarizeAssayByGroup(example_sce, ids)
out

batches <- sample(1:3, ncol(example_sce), replace=TRUE)
out2 <- summarizeAssayByGroup(example_sce, 
      DataFrame(label=ids, batch=batches))
head(out2)

LTLA/scuttle documentation built on March 9, 2024, 11:16 a.m.