getVarianceStabilizedData: Apply a variance stabilizing transformation (VST) to the...

Description Usage Arguments Details Value Author(s) Examples

View source: R/methods.R

Description

This function calculates a variance stabilizing transformation (VST) from the fitted dispersion-mean relation(s) and then transforms the count data (normalized by division by the size factor), yielding a matrix of values which are now approximately homoskedastic. This is useful as input to statistical analyses requiring homoskedasticity.

Usage

1
2

Arguments

cds

a CountDataSet which also contains the fitted dispersion-mean relation

Details

For each sample (i.e., column of counts(cds)), the full variance function is calculated from the raw variance (by scaling according to the size factor and adding the shot noise). The function requires a blind estimate of the variance function, i.e., one ignoring conditions. Usually, this is achieved by calling estimateDispersions with method="blind" before calling it. A typical workflow is shown in Section Variance stabilizing transformation in the package vignette.

If estimateDispersions was called with fitType="parametric", a closed-form expression for the variance stabilizing transformation is used on the normalized count data. The expression can be found in the file ‘vst.pdf’ which is distributed with the vignette.

If estimateDispersions was called with fitType="locfit", the reciprocal of the square root of the variance of the normalized counts, as derived from the dispersion fit, is then numerically integrated, and the integral (approximated by a spline function) is evaluated for each count value in the column, yielding a transformed value.

In both cases, the transformation is scaled such that for large counts, it becomes asymptotically (for large values) equal to the logarithm to base 2.

Limitations: In order to preserve normalization, the same transformation has to be used for all samples. This results in the variance stabilizition to be only approximate. The more the size factors differ, the more residual dependence of the variance on the mean you will find in the transformed data. As shown in the vignette, you can use the function meanSdPlot from the package vsn to see whether this is a problem for your data.

Value

For varianceStabilizingTransformation, an ExpressionSet.

For getVarianceStabilizedData, a matrix of the same dimension as the count data, containing the transformed values.

Author(s)

Simon Anders <sanders@fs.tum.de>

Examples

1
2
3
4
5
6
cds <- makeExampleCountDataSet()
cds <- estimateSizeFactors( cds )
cds <- estimateDispersions( cds, method="blind" )
vsd <- getVarianceStabilizedData( cds )
colsA <- conditions(cds) == "A"
plot( rank( rowMeans( vsd[,colsA] ) ), genefilter::rowVars( vsd[,colsA] ) )

Example output

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: locfit
locfit 1.5-9.1 	 2013-03-22
Loading required package: lattice
    Welcome to 'DESeq'. For improved performance, usability and
    functionality, please consider migrating to 'DESeq2'.
Warning messages:
1: glm.fit: algorithm did not converge 
2: In parametricDispersionFit(means, disps) :
  Dispersion fit did not converge.

DESeq documentation built on April 28, 2020, 6:37 p.m.