scde: Single Cell Differential Expression

scde.error.models

R Documentation

Fit single-cell error/regression models

Description

Fit error models given a set of single-cell data (counts) and an optional grouping factor (groups). The cells (within each group) are first cross-compared to determine a subset of genes showing consistent expression. The set of genes is then used to fit a mixture model (Poisson-NB mixture, with expression-dependent concomitant).

Usage

scde.error.models(counts, groups = NULL, min.nonfailed = 3,
  threshold.segmentation = TRUE, min.count.threshold = 4,
  zero.count.threshold = min.count.threshold, zero.lambda = 0.1,
  save.crossfit.plots = FALSE, save.model.plots = TRUE, n.cores = 12,
  min.size.entries = 2000, max.pairs = 5000, min.pairs.per.cell = 10,
  verbose = 0, linear.fit = TRUE, local.theta.fit = linear.fit,
  theta.fit.range = c(0.01, 100))

Arguments

`counts`	read count matrix. The rows correspond to genes (should be named), columns correspond to individual cells. The matrix should contain integer counts
`groups`	an optional factor describing grouping of different cells. If provided, the cross-fits and the expected expression magnitudes will be determined separately within each group. The factor should have the same length as ncol(counts).
`min.nonfailed`	minimal number of non-failed observations required for a gene to be used in the final model fitting
`threshold.segmentation`	use a fast threshold-based segmentation during cross-fit (default: TRUE)
`min.count.threshold`	the number of reads to use to guess which genes may have "failed" to be detected in a given measurement during cross-cell comparison (default: 4)
`zero.count.threshold`	threshold to guess the initial value (failed/non-failed) during error model fitting procedure (defaults to the min.count.threshold value)
`zero.lambda`	the rate of the Poisson (failure) component (default: 0.1)
`save.crossfit.plots`	whether png files showing cross-fit segmentations should be written out (default: FALSE)
`save.model.plots`	whether pdf files showing model fits should be written out (default = TRUE)
`n.cores`	number of cores to use
`min.size.entries`	minimum number of genes to use when determining expected expression magnitude during model fitting
`max.pairs`	maximum number of cross-fit comparisons that should be performed per group (default: 5000)
`min.pairs.per.cell`	minimum number of pairs that each cell should be cross-compared with
`verbose`	1 for increased output
`linear.fit`	Boolean of whether to use a linear fit in the regression (default: TRUE).
`local.theta.fit`	Boolean of whether to fit the overdispersion parameter theta, ie. the negative binomial size parameter, based on local regression (default: set to be equal to the linear.fit parameter)
`theta.fit.range`	Range of valid values for the overdispersion parameter theta, ie. the negative binomial size parameter (default: c(1e-2, 1e2))

Details

Note: the default implementation has been changed to use linear-scale fit with expression-dependent NB size (overdispersion) fit. This represents an interative improvement on the originally published model. Use linear.fit=F to revert back to the original fitting procedure.

Value

a model matrix, with rows corresponding to different cells, and columns representing different parameters of the determined models

Examples

data(es.mef.small)
cd <- clean.counts(es.mef.small, min.lib.size=1000, min.reads = 1, min.detected = 1)
sg <- factor(gsub("(MEF|ESC).*", "\\1", colnames(cd)), levels = c("ESC", "MEF"))
names(sg) <- colnames(cd)

o.ifm <- scde.error.models(counts = cd, groups = sg, n.cores = 10, threshold.segmentation = TRUE)

hms-dbmi/scde documentation built on April 19, 2023, 10:21 p.m.