aggregateData: Aggregation of single-cell to pseudobulk data
In muscat: Multi-sample multi-group scRNA-seq data analysis tools

Description Usage Arguments Value Author(s) References Examples

...

aggregateData(
  x,
  assay = NULL,
  by = c("cluster_id", "sample_id"),
  fun = c("sum", "mean", "median"),
  scale = FALSE
)

`x`	a `SingleCellExperiment`.
`assay`	character string specifying the assay slot to use as input data. Defaults to the 1st available (`assayNames(x)[1]`).
`by`	character vector specifying which `colData(x)` columns to summarize by (at most 2!).
`fun`	a character string. Specifies the function to use as summary statistic.
`scale`	logical. Should pseudo-bulks be scaled with the effective library size & multiplied by 1M?

a SingleCellExperiment.

If length(by) == 2, each sheet (assay) contains pseudobulks for each of by[1], e.g., for each cluster when by = "cluster_id". Rows correspond to genes, columns to by[2], e.g., samples when by = "sample_id".
If length(by) == 1, the returned SCE will contain only a single assay with rows = genes and colums = by.

Aggregation parameters (assay, by, fun, scaled) are stored in metadata()$agg_pars, and the number of cells that were aggregated are accessible in metadata()$n_cells.

Helena L Crowell & Mark D Robinson

Crowell, HL, Soneson, C, Germain, P-L, Calini, D, Collin, L, Raposo, C, Malhotra, D & Robinson, MD: On the discovery of population-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data. bioRxiv 713412 (2018). doi: https://doi.org/10.1101/713412

data(sce)
library(SingleCellExperiment)

# pseudobulk counts by cluster-sample
pb <- aggregateData(sce)

assayNames(sce)  # one sheet per cluster
head(assay(sce)) # n_genes x n_samples

# scaled CPM
assays(sce)$cpm <- edgeR::cpm(assay(sce))
pb <- aggregateData(sce, assay = "cpm", scale = TRUE)
head(assay(pb)) 

# aggregate by cluster only
pb <- aggregateData(sce, by = "cluster_id")
length(assays(pb)) # single assay
head(assay(pb))    # n_genes x n_clusters