summarizeAssayByGroup: Summarize an assay by group
In scuttle: Single-Cell RNA-Seq Analysis Utilities

Description Usage Arguments Details Value Author(s) See Also Examples

From an assay matrix, compute summary statistics for groups of cells. A typical example would be to compute various summary statistics for clusters.

summarizeAssayByGroup(x, ...)

## S4 method for signature 'ANY'
summarizeAssayByGroup(
  x,
  ids,
  subset.row = NULL,
  subset.col = NULL,
  statistics = c("mean", "sum", "num.detected", "prop.detected", "median"),
  store.number = "ncells",
  threshold = 0,
  BPPARAM = SerialParam()
)

## S4 method for signature 'SummarizedExperiment'
summarizeAssayByGroup(x, ..., assay.type = "counts")

`x`	A numeric matrix containing features in rows and cells in columns. Alternatively, a SummarizedExperiment object containing such a matrix.
`...`	For the generics, further arguments to be passed to specific methods. For the SummarizedExperiment method, further arguments to be passed to the ANY method.
`ids`	A factor (or vector coercible into a factor) specifying the group to which each cell in `x` belongs. Alternatively, a DataFrame of such vectors or factors, in which case each unique combination of levels defines a group.
`subset.row`	An integer, logical or character vector specifying the features to use. If `NULL`, defaults to all features.
`subset.col`	An integer, logical or character vector specifying the cells to use. Defaults to all cells with non-`NA` entries of `ids`.
`statistics`	Character vector specifying the type of statistics to be computed, see Details.
`store.number`	String specifying the field of the output `colData` to store the number of cells in each group. If `NULL`, nothing is stored.
`threshold`	A numeric scalar specifying the threshold above which a gene is considered to be detected.
`BPPARAM`	A BiocParallelParam object specifying whether summation should be parallelized.
`assay.type`	A string or integer scalar specifying the assay of `x` containing the assay to be summarized.

These functions provide a convenient method for summing or averaging expression values across multiple columns for each feature. A typical application would be to sum counts across all cells in each cluster to obtain “pseudo-bulk” samples for further analyses, e.g., differential expression analyses between conditions.

For each feature, the chosen assay can be aggregated by:

"sum", the sum of all values in each group. This makes the most sense for raw counts, to allow models to account for the mean-variance relationship.
"mean", the mean of all values in each group. This makes the most sense for normalized and/or transformed assays.
"median", the median of all values in each group. This makes the most sense for normalized and/or transformed assays, usually generated from large counts where discreteness is less of an issue.
"num.detected" and "prop.detected", the number and proportion of values in each group that are non-zero.# This makes the most sense for raw counts or sparsity-preserving transformations.

Any NA values in ids are implicitly ignored and will not be considered during summation. This may be useful for removing undesirable cells by setting their entries in ids to NA. Alternatively, we can explicitly select the cells of interest with subset_col.

If ids is a factor and contains unused levels, they will not be represented as columns in the output.

A SummarizedExperiment is returned with one column per level of ids. Each entry of the assay contains the sum or average across all cells in a given group (column) for a given feature (row). Columns are ordered by levels(ids) and the number of cells per level is reported in the "ncells" column metadata. For DataFrame ids, each column corresponds to a unique combination of levels (recorded in the colData).

Aaron Lun

aggregateAcrossCells, which also combines information in the colData of x.

example_sce <- mockSCE()
ids <- sample(LETTERS[1:5], ncol(example_sce), replace=TRUE)

out <- summarizeAssayByGroup(example_sce, ids)
out

batches <- sample(1:3, ncol(example_sce), replace=TRUE)
out2 <- summarizeAssayByGroup(example_sce, 
      DataFrame(label=ids, batch=batches))
head(out2)