prepSim: SCE preparation for 'simData'

View source: R/prepSim.R

prepSimR Documentation

SCE preparation for simData

Description

prepSim prepares an input SCE for simulation with muscat's simData function by

  1. basic filtering of genes and cells

  2. (optional) filtering of subpopulation-sample instances

  3. estimation of cell (library sizes) and gene parameters (dispersions and sample-specific means), respectively.

Usage

prepSim(
  x,
  min_count = 1,
  min_cells = 10,
  min_genes = 100,
  min_size = 100,
  group_keep = NULL,
  verbose = TRUE
)

Arguments

x

a SingleCellExperiment.

min_count, min_cells

used for filtering of genes; only genes with a count > min_count in >= min_cells will be retained.

min_genes

used for filtering cells; only cells with a count > 0 in >= min_genes will be retained.

min_size

used for filtering subpopulation-sample combinations; only instances with >= min_size cells will be retained. Specifying min_size = NULL skips this step.

group_keep

character string; if nlevels(x$group_id) > 1, specifies which group of samples to keep (see details). The default NULL retains samples from levels(x$group_id)[1]; otherwise, if 'colData(x)$group_id' is not specified, all samples will be kept.

verbose

logical; should information on progress be reported?

Details

For each gene g, prepSim fits a model to estimate sample-specific means β_g^s, for each sample s, and dispersion parameters φ_g using edgeR's estimateDisp function with default parameters. Thus, the reference count data is modeled as NB distributed:

Y_{gc} \sim NB(μ_{gc}, φ_g)

for gene g and cell c, where the mean μ_{gc} = \exp(β_{g}^{s(c)}) \cdot λ_c. Here, β_{g}^{s(c)} is the relative abundance of gene g in sample s(c), λ_c is the library size (total number of counts), and φ_g is the dispersion.

Value

a SingleCellExperiment containing, for each cell, library size (colData(x)$offset) and, for each gene, dispersion and sample-specific mean estimates (rowData(x)$dispersion and $beta.sample_id, respectively).

Author(s)

Helena L Crowell

References

Crowell, HL, Soneson, C, Germain, P-L, Calini, D, Collin, L, Raposo, C, Malhotra, D & Robinson, MD: On the discovery of population-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data. bioRxiv 713412 (2018). doi: https://doi.org/10.1101/713412

Examples

# estimate simulation parameters
data(example_sce)
ref <- prepSim(example_sce)

# tabulate number of genes/cells before vs. after
ns <- cbind(
  before = dim(example_sce), 
  after = dim(ref)) 
rownames(ns) <- c("#genes", "#cells")
ns

library(SingleCellExperiment)
head(rowData(ref)) # gene parameters
head(colData(ref)) # cell parameters


HelenaLC/muscat documentation built on June 25, 2022, 8:20 a.m.