summarizeAssayByGroup: Summarize an assay by group

Description Usage Arguments Details Value Author(s) See Also Examples

Description

From an assay matrix, compute summary statistics for groups of cells. A typical example would be to compute various summary statistics for clusters.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
summarizeAssayByGroup(x, ...)

## S4 method for signature 'ANY'
summarizeAssayByGroup(
  x,
  ids,
  subset.row = NULL,
  subset.col = NULL,
  statistics = c("mean", "sum", "num.detected", "prop.detected", "median"),
  store.number = "ncells",
  threshold = 0,
  BPPARAM = SerialParam()
)

## S4 method for signature 'SummarizedExperiment'
summarizeAssayByGroup(x, ..., assay.type = "counts")

Arguments

x

A numeric matrix containing features in rows and cells in columns. Alternatively, a SummarizedExperiment object containing such a matrix.

...

For the generics, further arguments to be passed to specific methods.

For the SummarizedExperiment method, further arguments to be passed to the ANY method.

ids

A factor (or vector coercible into a factor) specifying the group to which each cell in x belongs. Alternatively, a DataFrame of such vectors or factors, in which case each unique combination of levels defines a group.

subset.row

An integer, logical or character vector specifying the features to use. If NULL, defaults to all features.

subset.col

An integer, logical or character vector specifying the cells to use. Defaults to all cells with non-NA entries of ids.

statistics

Character vector specifying the type of statistics to be computed, see Details.

store.number

String specifying the field of the output colData to store the number of cells in each group. If NULL, nothing is stored.

threshold

A numeric scalar specifying the threshold above which a gene is considered to be detected.

BPPARAM

A BiocParallelParam object specifying whether summation should be parallelized.

assay.type

A string or integer scalar specifying the assay of x containing the assay to be summarized.

Details

These functions provide a convenient method for summing or averaging expression values across multiple columns for each feature. A typical application would be to sum counts across all cells in each cluster to obtain “pseudo-bulk” samples for further analyses, e.g., differential expression analyses between conditions.

For each feature, the chosen assay can be aggregated by:

Any NA values in ids are implicitly ignored and will not be considered during summation. This may be useful for removing undesirable cells by setting their entries in ids to NA. Alternatively, we can explicitly select the cells of interest with subset_col.

If ids is a factor and contains unused levels, they will not be represented as columns in the output.

Value

A SummarizedExperiment is returned with one column per level of ids. Each entry of the assay contains the sum or average across all cells in a given group (column) for a given feature (row). Columns are ordered by levels(ids) and the number of cells per level is reported in the "ncells" column metadata. For DataFrame ids, each column corresponds to a unique combination of levels (recorded in the colData).

Author(s)

Aaron Lun

See Also

aggregateAcrossCells, which also combines information in the colData of x.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
example_sce <- mockSCE()
ids <- sample(LETTERS[1:5], ncol(example_sce), replace=TRUE)

out <- summarizeAssayByGroup(example_sce, ids)
out

batches <- sample(1:3, ncol(example_sce), replace=TRUE)
out2 <- summarizeAssayByGroup(example_sce, 
      DataFrame(label=ids, batch=batches))
head(out2)

scuttle documentation built on Dec. 19, 2020, 2 a.m.