acosh_transform: Delta method-based variance stabilizing transformation

View source: R/delta_method_transform.R

acosh_transformR Documentation

Delta method-based variance stabilizing transformation

Description

Delta method-based variance stabilizing transformation

Usage

acosh_transform(
  data,
  overdispersion = 0.05,
  size_factors = TRUE,
  ...,
  on_disk = NULL,
  verbose = FALSE
)

shifted_log_transform(
  data,
  overdispersion = 0.05,
  pseudo_count = 1/(4 * overdispersion),
  size_factors = TRUE,
  minimum_overdispersion = 0.001,
  ...,
  on_disk = NULL,
  verbose = FALSE
)

Arguments

data

any matrix-like object (e.g. matrix, dgCMatrix, DelayedArray, HDF5Matrix) with one column per sample and row per gene. It can also be an object of type glmGamPoi, in which case it is directly used to calculate the variance-stabilized values.

overdispersion

the simplest count model is the Poisson model. However, the Poisson model assumes that variance = mean. For many applications this is too rigid and the Gamma-Poisson allows a more flexible mean-variance relation (variance = mean + mean^2 * overdispersion).
overdispersion can either be

  • a single boolean that indicates if an overdispersion is estimated for each gene.

  • a numeric vector of length nrow(data) fixing the overdispersion to those values.

  • the string "global" to indicate that one dispersion is fit across all genes.

Note that overdispersion = 0 and overdispersion = FALSE are equivalent and both reduce the Gamma-Poisson to the classical Poisson model. Default: 0.05 which is roughly the overdispersion observed on ostensibly homogeneous cell lines.

size_factors

in large scale experiments, each sample is typically of different size (for example different sequencing depths). A size factor is an internal mechanism of GLMs to correct for this effect.
size_factors is either a numeric vector with positive entries that has the same lengths as columns in the data that specifies the size factors that are used. Or it can be a string that species the method that is used to estimate the size factors (one of c("normed_sum", "deconvolution", "poscounts")). Note that "normed_sum" and "poscounts" are fairly simple methods and can lead to suboptimal results. For the best performance, I recommend to use size_factors = "deconvolution" which calls scran::calculateSumFactors(). However, you need to separately install the scran package from Bioconductor for this method to work. Also note that size_factors = 1 and size_factors = FALSE are equivalent. If only a single gene is given, no size factor is estimated (ie. size_factors = 1). Default: "normed_sum".

...

additional parameters for glmGamPoi::glm_gp() which is called in case overdispersion = TRUE.

on_disk

a boolean that indicates if the dataset is loaded into memory or if it is kept on disk to reduce the memory usage. Processing in memory can be significantly faster than on disk. Default: NULL which means that the data is only processed in memory if data is an in-memory data structure.

verbose

boolean that decides if information about the individual steps are printed. Default: FALSE

pseudo_count

instead of specifying the overdispersion, the shifted_log_transform is commonly parameterized with a pseudo-count (pseudo-count = 1/(4 * overdispersion)). If both the pseudo-count and overdispersion is specified, the overdispersion is ignored. Default: 1/(4 * overdispersion)

minimum_overdispersion

the acosh_transform converges against 2 * sqrt(x) for overdispersion == 0. However, the shifted_log_transform would just become 0, thus here we apply the minimum_overdispersion to avoid this behavior.

Value

a matrix (or a vector if the input is a vector) with the transformed values.

Functions

  • acosh_transform: 1/sqrt(alpha) acosh(2 * alpha * x + 1)

  • shifted_log_transform: 1/sqrt(alpha) log(4 * alpha * x + 1)

References

Ahlmann-Eltze, Constantin, and Wolfgang Huber. "Transformation and Preprocessing of Single-Cell RNA-Seq Data." bioRxiv (2021).

Ahlmann-Eltze, Constantin, and Wolfgang Huber. "glmGamPoi: Fitting Gamma-Poisson Generalized Linear Models on Single Cell Count Data." Bioinformatics (2020)

Dunn, Peter K., and Gordon K. Smyth. "Randomized quantile residuals." Journal of Computational and Graphical Statistics 5.3 (1996): 236-244.

Hafemeister, Christoph, and Rahul Satija. "Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression." Genome biology 20.1 (2019): 1-15.

Hafemeister, Christoph, and Rahul Satija. "Analyzing scRNA-seq data with the sctransform and offset models." (2020)

Lause, Jan, Philipp Berens, and Dmitry Kobak. "Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data." Genome Biology (2021).

See Also

acosh_transform, shifted_log_transform, and residual_transform

Examples

  # Load a single cell dataset
  sce <- TENxPBMCData::TENxPBMCData("pbmc4k")
  # Reduce size for this example
  set.seed(1)
  sce_red <- sce[sample(which(rowSums2(counts(sce)) > 0), 1000),
                 sample(ncol(sce), 200)]

  assay(sce_red, "acosh") <- acosh_transform(sce_red)
  assay(sce_red, "shifted_log") <- shifted_log_transform(sce_red)
  plot(rowMeans2(assay(sce_red, "acosh")), rowVars(assay(sce_red, "acosh")), log = "x")
  points(rowMeans2(assay(sce_red, "shifted_log")), rowVars(assay(sce_red, "shifted_log")),
         col = "red")

  # Sqrt transformation
  sqrt_dat <- acosh_transform(sce_red, overdispersion = 0, size_factor = 1)
  plot(2 * sqrt(assay(sce_red))[,1], sqrt_dat[,1]); abline(0,1)


const-ae/transformGamPoi documentation built on April 14, 2023, 11:33 p.m.