aggregateData: Aggregation of single-cell to pseudobulk data
In HelenaLC/muscat: Multi-sample multi-group scRNA-seq data analysis tools

View source: R/aggregateData.R

aggregateData

R Documentation

Aggregation of single-cell to pseudobulk data

Description

...

Usage

aggregateData(
  x,
  assay = NULL,
  by = c("cluster_id", "sample_id"),
  fun = c("sum", "mean", "median", "prop.detected", "num.detected"),
  scale = FALSE,
  verbose = TRUE,
  BPPARAM = SerialParam(progressbar = verbose)
)

Arguments

`x`	a `SingleCellExperiment`.
`assay`	character string specifying the assay slot to use as input data. Defaults to the 1st available (`assayNames(x)[1]`).
`by`	character vector specifying which `colData(x)` columns to summarize by (at most 2!).
`fun`	a character string. Specifies the function to use as summary statistic. Passed to `summarizeAssayByGroup`.
`scale`	logical. Should pseudo-bulks be scaled with the effective library size & multiplied by 1M?
`verbose`	logical. Should information on progress be reported?
`BPPARAM`	a `BiocParallelParam` object specifying how aggregation should be parallelized.

Value

a SingleCellExperiment.

If length(by) == 2, each sheet (assay) contains pseudobulks for each of by[1], e.g., for each cluster when by = "cluster_id". Rows correspond to genes, columns to by[2], e.g., samples when by = "sample_id".
If length(by) == 1, the returned SCE will contain only a single assay with rows = genes and colums = by.

Aggregation parameters (assay, by, fun, scaled) are stored in metadata()$agg_pars, and the number of cells that were aggregated are accessible in int_colData()$n_cells.

Author(s)

Helena L Crowell & Mark D Robinson

References

Crowell, HL, Soneson, C, Germain, P-L, Calini, D, Collin, L, Raposo, C, Malhotra, D & Robinson, MD: On the discovery of population-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data. bioRxiv 713412 (2018). doi: https://doi.org/10.1101/713412

Examples

# pseudobulk counts by cluster-sample
data(example_sce)
pb <- aggregateData(example_sce)

library(SingleCellExperiment)
assayNames(example_sce)  # one sheet per cluster
head(assay(example_sce)) # n_genes x n_samples

# scaled CPM
cpm <- edgeR::cpm(assay(example_sce))
assays(example_sce)$cpm <- cpm
pb <- aggregateData(example_sce, assay = "cpm", scale = TRUE)
head(assay(pb)) 

# aggregate by cluster only
pb <- aggregateData(example_sce, by = "cluster_id")
length(assays(pb)) # single assay
head(assay(pb))    # n_genes x n_clusters

HelenaLC/muscat documentation built on June 9, 2025, 9:33 a.m.