prepSim: SCE preparation for 'simData'

Description Usage Arguments Details Value Author(s) References Examples

View source: R/prepSim.R

Description

prepSim prepares an input SCE for simulation with muscat's simData function by

  1. basic filtering of genes and cells

  2. (optional) filtering of subpopulation-sample instances

  3. estimation of cell (library sizes) and gene parameters (dispersions and sample-specific means), respectively.

Usage

1
2
3
4
5
6
7
8
9
prepSim(
  x,
  min_count = 1,
  min_cells = 10,
  min_genes = 100,
  min_size = 100,
  group_keep = NULL,
  verbose = TRUE
)

Arguments

x

a SingleCellExperiment.

min_count, min_cells

used for filtering of genes; only genes with a count > min_count in >= min_cells will be retained.

min_genes

used for filtering cells; only cells with a count > 0 in >= min_genes will be retained.

min_size

used for filtering subpopulation-sample combinations; only instances with >= min_size cells will be retained. Specifying min_size = NULL skips this step.

group_keep

character string; if nlevels(x$group_id) > 1, specifies which group of samples to keep (see details). The default NULL retains samples from levels(x$group_id)[1]; otherwise, if 'colData(x)$group_id' is not specified, all samples will be kept.

verbose

logical; should information on progress be reported?

Details

For each gene g, prepSim fits a model to estimate sample-specific means β_g^s, for each sample s, and dispersion parameters φ_g using edgeR's estimateDisp function with default parameters. Thus, the reference count data is modeled as NB distributed:

Y_{gc} \sim NB(μ_{gc}, φ_g)

for gene g and cell c, where the mean μ_{gc} = \exp(β_{g}^{s(c)}) \cdot λ_c. Here, β_{g}^{s(c)} is the relative abundance of gene g in sample s(c), λ_c is the library size (total number of counts), and φ_g is the dispersion.

Value

a SingleCellExperiment containing, for each cell, library size (colData(x)$offset) and, for each gene, dispersion and sample-specific mean estimates (rowData(x)$dispersion and $beta.sample_id, respectively).

Author(s)

Helena L Crowell

References

Crowell, HL, Soneson, C, Germain, P-L, Calini, D, Collin, L, Raposo, C, Malhotra, D & Robinson, MD: On the discovery of population-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data. bioRxiv 713412 (2018). doi: https://doi.org/10.1101/713412

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data(sce)
library(SingleCellExperiment)

ref <- prepSim(sce)

# nb. of genes/cells before vs. after
ns <- cbind(before = dim(sce), after = dim(ref)) 
rownames(ns) <- c("#genes", "#cells"); ns

head(rowData(ref)) # gene parameters
head(colData(ref)) # cell parameters

muscat documentation built on Nov. 8, 2020, 7:47 p.m.