vst: Variance Stabilizing Transformation

Description Usage Arguments Details Value Author(s) References Examples

View source: R/varistran.R

Description

Perform a Variance Stabilizing Transformation (VST) of a matrix of count data.

Usage

1
2
3
4
5
6
7
8
vst(
  x,
  method = "anscombe.nb",
  lib.size = NULL,
  cpm = FALSE,
  dispersion = NULL,
  design = NULL
)

Arguments

x

A matrix of counts. Rows are genes (or other features), and columns are samples.

method

VST to use, see details.

lib.size

Optional, estimated if not given.

cpm

Should the output be in log2 Counts Per Million, rather than simply log2.

dispersion

Optional, estimated if not given. Dispersion parameter of the negative binomial distribution of the data.

design

Optional. If dispersion isn't given, a design matrix to use when estimating dispersion.

Details

Several methods are available. "anscombe.nb" is recommended.

Methods:

"anscombe.nb": Default, asinh(sqrt((x+3/8)/(1/dispersion-3/4))). Anscombe's VST for the negative binomial distribution.

"anscombe.nb.simple": log(x+0.5/dispersion), a simplified VST also given by Anscombe.

"anscombe.poisson": sqrt(x+3/8). Anscombe's VST for the Poisson distribution. Only appropriate if you know there is no biological noise.

"naive.nb": asinh(sqrt(x/dispersion)). Resultant variance is slightly inflated at low counts.

"naive.poisson": sqrt(x). Resultant variance is slightly inflated at low counts.

Dispersion:

edgeR's estimate of the common dispersion of the count matrix would be a reasonable choice of dispersion. However Poisson noise in RNA-Seq data may be over-dispersed, in which case a slightly smaller dispersion may work better. I recommend not providing a dispersion and letting varistran pick an appropriate value.

If "dispersion" is not given, it is chosen so as to minimize sd(residual s.d.)/mean(residual s.d.). Residuals are calculated from the linear model specified by the parameter "design".

If "design" also isn't given, a linear model containing only an intercept term is used. This may lead to an over-estimate of the dispersion, so do give a design if possible.

Value

A transformed matrix.

Author(s)

Paul Harrison

References

Anscombe, F.J. (1948) "The transformation of Poisson, binomial, and negative-binomial data", Biometrika 35 (3-4): 246-254

Examples

1
2
3
4
5
6
7
8
# Generate some random data.
means <- runif(100,min=0,max=1000)
counts <- matrix(rnbinom(1000, size=1/0.01, mu=rep(means,10)), ncol=10)

y <- varistran::vst(counts)

# Information about the transformation
varistran::vst_advice(y)

MonashBioinformaticsPlatform/varistran documentation built on March 21, 2020, 3:20 p.m.