vst | R Documentation |
Perform a Variance Stabilizing Transformation (VST) of a matrix of count data.
vst(
x,
method = "anscombe.nb",
lib.size = NULL,
cpm = FALSE,
dispersion = NULL,
design = NULL
)
x |
A matrix of counts. Rows are genes (or other features), and columns are samples. |
method |
VST to use, see details. |
lib.size |
Optional, estimated if not given. |
cpm |
Should the output be in log2 Counts Per Million, rather than simply log2. |
dispersion |
Optional, estimated if not given. Dispersion parameter of the negative binomial distribution of the data. |
design |
Optional. If dispersion isn't given, a design matrix to use when estimating dispersion. |
Several methods are available. "anscombe.nb" is recommended.
Methods:
"anscombe.nb": Default, asinh(sqrt((x+3/8)/(1/dispersion-3/4))). Anscombe's VST for the negative binomial distribution.
"anscombe.nb.simple": log(x+0.5/dispersion), a simplified VST also given by Anscombe.
"anscombe.poisson": sqrt(x+3/8). Anscombe's VST for the Poisson distribution. Only appropriate if you know there is no biological noise.
"naive.nb": asinh(sqrt(x/dispersion)). Resultant variance is slightly inflated at low counts.
"naive.poisson": sqrt(x). Resultant variance is slightly inflated at low counts.
Dispersion:
edgeR's estimate of the common dispersion of the count matrix would be a reasonable choice of dispersion. However Poisson noise in RNA-Seq data may be over-dispersed, in which case a slightly smaller dispersion may work better. I recommend not providing a dispersion and letting varistran pick an appropriate value.
If "dispersion" is not given, it is chosen so as to minimize sd(residual s.d.)/mean(residual s.d.). Residuals are calculated from the linear model specified by the parameter "design".
If "design" also isn't given, a linear model containing only an intercept term is used. This may lead to an over-estimate of the dispersion, so do give a design if possible.
A transformed matrix.
Paul Harrison
Anscombe, F.J. (1948) "The transformation of Poisson, binomial, and negative-binomial data", Biometrika 35 (3-4): 246-254
# Generate some random data.
means <- runif(100,min=0,max=1000)
counts <- matrix(rnbinom(1000, size=1/0.01, mu=rep(means,10)), ncol=10)
y <- varistran::vst(counts)
# Information about the transformation
varistran::vst_advice(y)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.