pbDS: pseudobulk DS analysis
In muscat: Multi-sample multi-group scRNA-seq data analysis tools

Description Usage Arguments Value Author(s) References Examples

pbDS tests for DS after aggregating single-cell measurements to pseudobulk data, by applying bulk RNA-seq DE methods, such as edgeR, DESeq2 and limma.

pbDS(
  pb,
  method = c("edgeR", "DESeq2", "limma-trend", "limma-voom"),
  design = NULL,
  coef = NULL,
  contrast = NULL,
  min_cells = 10,
  filter = c("both", "genes", "samples", "none"),
  treat = FALSE,
  verbose = TRUE
)

`pb`	a `SingleCellExperiment` containing pseudobulks as returned by `aggregateData`.
`method`	a character string.
`design`	For methods `"edegR"` and `"limma"`, a design matrix with row & column names(!) created with `model.matrix`; For `"DESeq2"`, a formula with variables in `colData(pb)`. Defaults to `~ group_id` or the corresponding `model.matrix`.
`coef`	passed to `glmQLFTest`, `contrasts.fit`, `results` for `method = "edgeR", "limma-x", "DESeq2"`, respectively. Can be a list for multiple, independent comparisons.
`contrast`	a matrix of contrasts to test for created with `makeContrasts`.
`min_cells`	a numeric. Specifies the minimum number of cells in a given cluster-sample required to consider the sample for differential testing.
`filter`	characterstring specifying whether to filter on genes, samples, both or neither.
`treat`	logical specifying whether empirical Bayes moderated-t p-values should be computed relative to a minimum fold-change threshold. Only applicable for methods `"limma-x"` (`treat`) and `"edgeR"` (`glmTreat`), and ignored otherwise.
`verbose`	logical. Should information on progress be reported?

a list containing

a data.frame with differential testing results,
a DGEList object of length nb.-clusters, and
the design matrix, and contrast or coef used.

Helena L Crowell & Mark D Robinson

Crowell, HL, Soneson, C, Germain, P-L, Calini, D, Collin, L, Raposo, C, Malhotra, D & Robinson, MD: On the discovery of population-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data. bioRxiv 713412 (2018). doi: https://doi.org/10.1101/713412

# simulate 5 clusters, 20% of DE genes
data(sce)
    
# compute pseudobulk sum-counts & run DS analysis
pb <- aggregateData(sce)
res <- pbDS(pb, method = "limma-trend")

names(res)
names(res$table)
head(res$table$`stim-ctrl`$`B cells`)

# count nb. of DE genes by cluster
vapply(res$table$`stim-ctrl`, function(u) 
  sum(u$p_adj.loc < 0.05), numeric(1))

# get top 5 hits for ea. cluster w/ abs(logFC) > 1
library(dplyr)
lapply(res$table$`stim-ctrl`, function(u)
  filter(u, abs(logFC) > 1) %>% 
    arrange(p_adj.loc) %>% 
    slice(seq_len(5)))