RUVr: Remove Unwanted Variation Using Residuals
In RUVSeq: Remove Unwanted Variation from RNA-Seq Data

Description Usage Arguments Details Methods Author(s) References See Also Examples

This function implements the RUVr method of Risso et al. (2014).

1	RUVr(x, cIdx, k, residuals, center=TRUE, round=TRUE, epsilon=1, tolerance=1e-8, isLog=FALSE)

`x`	Either a genes-by-samples numeric matrix or a SeqExpressionSet object containing the read counts.
`cIdx`	A character, logical, or numeric vector indicating the subset of genes to be used as negative controls in the estimation of the factors of unwanted variation.
`k`	The number of factors of unwanted variation to be estimated from the data.
`residuals`	A genes-by-samples matrix of residuals obtained from a first-pass regression of the counts on the covariates of interest, usually the negative binomial deviance residuals obtained from edgeR with the `residuals` method.
`center`	If `TRUE`, the residuals are centered, for each gene, to have mean zero across samples.
`round`	If `TRUE`, the normalized measures are rounded to form pseudo-counts.
`epsilon`	A small constant (usually no larger than one) to be added to the counts prior to the log transformation to avoid problems with log(0).
`tolerance`	Tolerance in the selection of the number of positive singular values, i.e., a singular value must be larger than `tolerance` to be considered positive.
`isLog`	Set to `TRUE` if the input matrix is already log-transformed.

The RUVr procedure performs factor analysis on residuals, such as deviance residuals from a first-pass GLM regression of the counts on the covariates of interest using edgeR. The counts may be either unnormalized or normalized with a method such as upper-quartile (UQ) normalization.

signature(x = "matrix", cIdx = "ANY", k = "numeric", residuals = "matrix")

It returns a list with

A samples-by-factors matrix with the estimated factors of unwanted variation (W).
The genes-by-samples matrix of normalized expression measures (possibly rounded) obtained by removing the factors of unwanted variation from the original read counts (normalizedCounts).

signature(x = "SeqExpressionSet", cIdx = "character", k="numeric", residuals = "matrix")

It returns a SeqExpressionSet with

The normalized counts in the normalizedCounts slot.
The estimated factors of unwanted variation as additional columns of the phenoData slot.

Davide Risso

D. Risso, J. Ngai, T. P. Speed, and S. Dudoit. Normalization of RNA-seq data using factor analysis of control genes or samples. Nature Biotechnology, 2014. (In press).

D. Risso, J. Ngai, T. P. Speed, and S. Dudoit. The role of spike-in standards in the normalization of RNA-Seq. In D. Nettleton and S. Datta, editors, Statistical Analysis of Next Generation Sequence Data. Springer, 2014. (In press).

RUVg, RUVs, residuals.

library(edgeR)
library(zebrafishRNASeq)
data(zfGenes)

## run on a subset of genes for time reasons
## (real analyses should be performed on all genes)
genes <- rownames(zfGenes)[grep("^ENS", rownames(zfGenes))]
spikes <- rownames(zfGenes)[grep("^ERCC", rownames(zfGenes))]
set.seed(123)
idx <- c(sample(genes, 1000), spikes)
seq <- newSeqExpressionSet(as.matrix(zfGenes[idx,]))

# Residuals from negative binomial GLM regression of UQ-normalized
# counts on covariates of interest, with edgeR
x <- as.factor(rep(c("Ctl", "Trt"), each=3))
design <- model.matrix(~x)
y <- DGEList(counts=counts(seq), group=x)
y <- calcNormFactors(y, method="upperquartile")
y <- estimateGLMCommonDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)

fit <- glmFit(y, design)
res <- residuals(fit, type="deviance")

# RUVr normalization (after UQ)
seqUQ <- betweenLaneNormalization(seq, which="upper")
controls <- rownames(seq)
seqRUVr <- RUVr(seqUQ, controls, k=1, res)

pData(seqRUVr)
head(normCounts(seqRUVr))