summarizeAssayByGroup | R Documentation |
From an assay matrix, compute summary statistics for groups of cells. A typical example would be to compute various summary statistics for clusters.
summarizeAssayByGroup(x, ...)
## S4 method for signature 'ANY'
summarizeAssayByGroup(
x,
ids,
subset.row = NULL,
subset.col = NULL,
statistics = c("mean", "sum", "num.detected", "prop.detected", "median"),
store.number = "ncells",
threshold = 0,
BPPARAM = SerialParam()
)
## S4 method for signature 'SummarizedExperiment'
summarizeAssayByGroup(x, ..., assay.type = "counts")
x |
A numeric matrix containing features in rows and cells in columns. Alternatively, a SummarizedExperiment object containing such a matrix. |
... |
For the generics, further arguments to be passed to specific methods. For the SummarizedExperiment method, further arguments to be passed to the ANY method. |
ids |
A factor (or vector coercible into a factor) specifying the group to which each cell in |
subset.row |
An integer, logical or character vector specifying the features to use.
If |
subset.col |
An integer, logical or character vector specifying the cells to use.
Defaults to all cells with non- |
statistics |
Character vector specifying the type of statistics to be computed, see Details. |
store.number |
String specifying the field of the output |
threshold |
A numeric scalar specifying the threshold above which a gene is considered to be detected. |
BPPARAM |
A BiocParallelParam object specifying whether summation should be parallelized. |
assay.type |
A string or integer scalar specifying the assay of |
These functions provide a convenient method for summing or averaging expression values across multiple columns for each feature. A typical application would be to sum counts across all cells in each cluster to obtain “pseudo-bulk” samples for further analyses, e.g., differential expression analyses between conditions.
For each feature, the chosen assay can be aggregated by:
"sum"
, the sum of all values in each group.
This makes the most sense for raw counts, to allow models to account for the mean-variance relationship.
"mean"
, the mean of all values in each group.
This makes the most sense for normalized and/or transformed assays.
"median"
, the median of all values in each group.
This makes the most sense for normalized and/or transformed assays,
usually generated from large counts where discreteness is less of an issue.
"num.detected"
and "prop.detected"
, the number and proportion of values in each group that are non-zero.
This makes the most sense for raw counts or sparsity-preserving transformations.
Any NA
values in ids
are implicitly ignored and will not be considered during summation.
This may be useful for removing undesirable cells by setting their entries in ids
to NA
.
Alternatively, we can explicitly select the cells of interest with subset_col
.
If ids
is a factor and contains unused levels, they will not be represented as columns in the output.
A SummarizedExperiment is returned with one column per level of ids
.
Each entry of the assay contains the sum or average across all cells in a given group (column) for a given feature (row).
Columns are ordered by levels(ids)
and the number of cells per level is reported in the "ncells"
column metadata.
For DataFrame ids
, each column corresponds to a unique combination of levels (recorded in the colData
).
Aaron Lun
aggregateAcrossCells
, which also combines information in the colData
of x
.
example_sce <- mockSCE()
ids <- sample(LETTERS[1:5], ncol(example_sce), replace=TRUE)
out <- summarizeAssayByGroup(example_sce, ids)
out
batches <- sample(1:3, ncol(example_sce), replace=TRUE)
out2 <- summarizeAssayByGroup(example_sce,
DataFrame(label=ids, batch=batches))
head(out2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.