fitTrendCV2: Fit a trend to the CV2

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/fitTrendCV2.R

Description

Fit a mean-dependent trend to the squared coefficient of variation, computed from count data after size factor normalization.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
fitTrendCV2(
  means,
  cv2,
  ncells,
  min.mean = 0.1,
  nls.args = list(),
  simplified = TRUE,
  nmads = 6,
  max.iter = 50
)

Arguments

means

A numeric vector containing mean normalized expression values for all genes.

cv2

A numeric vector containing the squared coefficient of variation computed from normalized expression values for all genes.

ncells

Integer scalar specifying the number of cells used to compute cv2 and means.

min.mean

Numeric scalar specifying the minimum mean to use for trend fitting.

nls.args

A list of parameters to pass to nls.

simplified

Logical scalar indicating whether the function can automatically use a simpler trend if errors are encountered for the usual paramterization.

nmads

Numeric scalar specifying the number of MADs to use to compute the tricube bandwidth during robustification.

max.iter

Integer scalar specifying the maximum number of robustness iterations to perform.

Details

This function fits a mean-dependent trend to the CV2 of normalized expression values for the selected features. Specifically, it fits a trend of the form

y = A + B/x

using an iteratively reweighted least-squares approach implemented via nls. This trend is based on a similar formulation from DESeq2 and generally captures the mean-CV2 trend well.

Trend fitting is performed after weighting each observation according to the inverse of the density of observations at the same mean. This avoids problems with differences in the distribution of means that would otherwise favor good fits in highly dense intervals at the expense of sparser intervals. Low-abundance genes with means below min.mean are also removed prior to fitting, to avoid problems with discreteness and the upper bound on the CV2 at low counts.

Robustness iterations are also performed to protect against outliers. An initial fit is performed and each observation is weighted using tricube-transformed standardized residuals (in addition to the existing inverse-density weights). The bandwidth of the tricube scheme is defined as nmads multiplied by the median standardized residual. Iterations are performed until convergence or max.iters is reached.

Occasionally, there are not enough high-abundance points to uniquely determine the A parameter. In such cases, the function collapses back to fitting a simpler trend

y = B/x

to avoid errors about singular gradients in nls. If simplified=FALSE, this simplification is not allowed and the error is directly reported.

Value

A named list is returned containing:

trend:

A function that returns the fitted value of the trend at any value of the mean.

std.dev:

A numeric scalar containing the robust standard deviation of the ratio of var to the fitted value of the trend across all features used for trend fitting.

Author(s)

Aaron Lun

References

Brennecke P, Anders S, Kim JK et al. (2013). Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods 10:1093-95

See Also

modelGeneCV2 and modelGeneCV2WithSpikes, where this function is used.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
library(scuttle)
sce <- mockSCE()
normcounts <- normalizeCounts(sce, log=FALSE)

# Fitting a trend:
library(DelayedMatrixStats)
means <- rowMeans(normcounts)
cv2 <- rowVars(normcounts)/means^2
fit <- fitTrendCV2(means, cv2, ncol(sce))

# Examining the trend fit:
plot(means, cv2, pch=16, cex=0.5,
    xlab="Mean", ylab="CV2", log="xy")
curve(fit$trend(x), add=TRUE, col="dodgerblue", lwd=3)

scran documentation built on April 17, 2021, 6:09 p.m.