aggregateData: Aggregation of single-cell to pseudobulk data

View source: R/aggregateData.R

aggregateDataR Documentation

Aggregation of single-cell to pseudobulk data

Description

...

Usage

aggregateData(
  x,
  assay = NULL,
  by = c("cluster_id", "sample_id"),
  fun = c("sum", "mean", "median", "prop.detected", "num.detected"),
  scale = FALSE,
  verbose = TRUE,
  BPPARAM = SerialParam(progressbar = verbose)
)

Arguments

x

a SingleCellExperiment.

assay

character string specifying the assay slot to use as input data. Defaults to the 1st available (assayNames(x)[1]).

by

character vector specifying which colData(x) columns to summarize by (at most 2!).

fun

a character string. Specifies the function to use as summary statistic. Passed to summarizeAssayByGroup.

scale

logical. Should pseudo-bulks be scaled with the effective library size & multiplied by 1M?

verbose

logical. Should information on progress be reported?

BPPARAM

a BiocParallelParam object specifying how aggregation should be parallelized.

Value

a SingleCellExperiment.

  • If length(by) == 2, each sheet (assay) contains pseudobulks for each of by[1], e.g., for each cluster when by = "cluster_id". Rows correspond to genes, columns to by[2], e.g., samples when by = "sample_id".

  • If length(by) == 1, the returned SCE will contain only a single assay with rows = genes and colums = by.

Aggregation parameters (assay, by, fun, scaled) are stored in metadata()$agg_pars, and the number of cells that were aggregated are accessible in int_colData()$n_cells.

Author(s)

Helena L Crowell & Mark D Robinson

References

Crowell, HL, Soneson, C, Germain, P-L, Calini, D, Collin, L, Raposo, C, Malhotra, D & Robinson, MD: On the discovery of population-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data. bioRxiv 713412 (2018). doi: https://doi.org/10.1101/713412

Examples

# pseudobulk counts by cluster-sample
data(example_sce)
pb <- aggregateData(example_sce)

library(SingleCellExperiment)
assayNames(example_sce)  # one sheet per cluster
head(assay(example_sce)) # n_genes x n_samples

# scaled CPM
cpm <- edgeR::cpm(assay(example_sce))
assays(example_sce)$cpm <- cpm
pb <- aggregateData(example_sce, assay = "cpm", scale = TRUE)
head(assay(pb)) 

# aggregate by cluster only
pb <- aggregateData(example_sce, by = "cluster_id")
length(assays(pb)) # single assay
head(assay(pb))    # n_genes x n_clusters


HelenaLC/ddSingleCell documentation built on Feb. 21, 2023, 4:31 p.m.