voom: Transform RNA-Seq Data Ready for Linear Modelling

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/voom.R

Description

Transform count data to log2-counts per million (logCPM), estimate the mean-variance relationship and use this to compute appropriate observation-level weights. The data are then ready for linear modelling.

Usage

1
2
3
voom(counts, design = NULL, lib.size = NULL, normalize.method = "none",
     block = NULL, correlation = NULL, weights = NULL,
     span = 0.5, plot = FALSE, save.plot = FALSE)

Arguments

counts

a numeric matrix containing raw counts, or an ExpressionSet containing raw counts, or a DGEList object. Counts must be non-negative and NAs are not permitted.

design

design matrix with rows corresponding to samples and columns to coefficients to be estimated. Defaults to the unit vector meaning that samples are treated as replicates.

lib.size

numeric vector containing total library sizes for each sample. Defaults to the normalized (effective) library sizes in counts if counts is a DGEList or to the columnwise count totals if counts is a matrix.

normalize.method

the microarray-style normalization method to be applied to the logCPM values (if any). Choices are as for the method argument of normalizeBetweenArrays when the data is single-channel. Any normalization factors found in counts will still be used even if normalize.method="none".

block

vector or factor specifying a blocking variable on the samples. Has length equal to the number of ncol(counts).

correlation

the intrablock correlation.

weights

prior weights. Can be a numeric matrix of individual weights of same dimensions as the counts, or a numeric vector of sample weights with length equal to ncol(counts), or a numeric vector of gene weights with length equal to nrow(counts).

span

width of the smoothing window used for the lowess mean-variance trend. Expressed as a proportion between 0 and 1.

plot

logical, should a plot of the mean-variance trend be displayed?

save.plot

logical, should the coordinates and line of the plot be saved in the output?

Details

This function is intended to process RNA-seq or ChIP-seq data prior to linear modelling in limma.

voom is an acronym for mean-variance modelling at the observational level. The idea is to estimate the mean-variance relationship in the data, then use this to compute an appropriate precision weight for each observation. Count data always show marked mean-variance relationships. Raw counts show increasing variance with increasing count size, while log-counts typically show a decreasing mean-variance trend. This function estimates the mean-variance trend for log-counts, then assigns a weight to each observation based on its predicted variance. The weights are then used in the linear modelling process to adjust for heteroscedasticity.

voom performs the following specific calculations. First, the counts are converted to logCPM values, adding 0.5 to all the counts to avoid taking the logarithm of zero. The matrix of logCPM values is then optionally normalized. The lmFit function is used to fit row-wise linear models. The lowess function is then used to fit a trend to the square-root-standard-deviations as a function of an average log-count measure. The trend line is then used to predict the variance of each logCPM value as a function of its fitted value on the count scale, and the inverse variances become the estimated precision weights.

The optional arguments block, correlation and weights are passed to lmFit in the above calling sequence, so they influence the row-wise standard deviations to which the mean-variance trend is fitted. The arguments block and correlation have the same meaning as for lmFit. Most users will not need to specify the weights argument but, if it is included, then the output weights are taken to modify the input prior weights in a multiplicative fashion.

For good results, the counts matrix should be filtered to remove remove rows with very low counts before running voom(). The filterByExpr function in the edgeR package can be used for that purpose.

If counts is a DGEList object from the edgeR package, then voom will use the normalization factors found in the object when computing the logCPM values. In other words, the logCPM values are computed from the effective library sizes rather than the raw library sizes. If the DGEList object has been scale-normalized in edgeR, then it is usual to leave normalize.method="none" in voom, i.e., the logCPM values should not usually be re-normalized in the voom call.

The voom method is similar in purpose to the limma-trend method, which uses eBayes or treat with trend=TRUE. The voom method incorporates the mean-variance trend into the precision weights, whereas limma-trend incorporates the trend into the empirical Bayes moderation. The voom method takes into account the sequencing depths (library sizes) of the individual columns of counts and applies the mean-variance trend on an individual observation basis. limma-trend, on the other hand, assumes that the library sizes are not wildly different and applies the mean-variance trend on a genewise basis. As noted by Law et al (2014), voom should be more powerful than limma-trend if the library sizes are very different but, otherwise, the two methods should give similar results.

Note that edgeR::voomLmFit is now recommended over voom for sparse counts with a medium to high proportion of zeros.

Value

An EList object with the following components:

E

numeric matrix of normalized expression values on the log2 scale

weights

numeric matrix of inverse variance weights

design

design matrix

lib.size

numeric vector of total normalized library sizes

genes

dataframe of gene annotation extracted from counts

voom.xy

if save.plot, list containing x and y coordinates for points in mean-variance plot

voom.line

if save.plot, list containing coordinates of loess line in the mean-variance plot

Note

voom is designed to accept counts. Usually these will be sequence read counts, but counts of species abundance or other biological quantities might also be appropriate. Estimated counts are also acceptable provided that the column sums are representative of the total library size (total number of reads) for that sample. voom can analyse scaled counts provided that the column sums remain proportional to the total library sizes. voom is designed to take account of sample-specific library sizes and hence voom should not be used to analyse quantities that have been normalized for library size such as RPKM, transcripts per million (TPM) or counts per million (CPM). Such quantities prevent voom from infering the correct library sizes and hence the correct precision with which each value was measured.

Author(s)

Charity Law and Gordon Smyth

References

Law, CW, Chen, Y, Shi, W, Smyth, GK (2014). Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29. See also the Preprint Version at http://www.statsci.org/smyth/pubs/VoomPreprint.pdf incorporating some notational corrections.

Law, CW, Alhamdoosh, M, Su, S, Smyth, GK, Ritchie, ME (2016). RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR. F1000Research 5, 1408. https://f1000research.com/articles/5-1408

Law, CW, Alhamdoosh, M, Su, S, Dong, X, Tian, L, Smyth, GK, Ritchie, ME (2018). RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR. Bioconductor Workflow Package. https://www.bioconductor.org/packages/RNAseq123/

See Also

eBayes, voomWithQualityWeights. vooma is similar to voom but for microarrays instead of RNA-seq.

voomLmFit in the edgeR package is a further developed version of voom particularly for sparse data.

A summary of functions for RNA-seq analysis is given in 11.RNAseq.

Examples

1
2
3
4
5
6
## Not run: 
keep <- filterByExpr(counts, design)
v <- voom(counts[keep,], design, plot=TRUE)
fit <- lmFit(v, design)
fit <- eBayes(fit, robust=TRUE)
## End(Not run)

limma documentation built on Nov. 8, 2020, 8:28 p.m.