RUVs: Remove Unwanted Variation Using Replicate/Negative Control...
In RUVSeq: Remove Unwanted Variation from RNA-Seq Data

Description Usage Arguments Details Methods Author(s) References See Also Examples

This function implements the RUVs method of Risso et al. (2014).

1	RUVs(x, cIdx, k, scIdx, round=TRUE, epsilon=1, tolerance=1e-8, isLog=FALSE)

`x`	Either a genes-by-samples numeric matrix or a SeqExpressionSet object containing the read counts.
`cIdx`	A character, logical, or numeric vector indicating the subset of genes to be used as negative controls in the estimation of the factors of unwanted variation.
`k`	The number of factors of unwanted variation to be estimated from the data.
`scIdx`	A numeric matrix specifying the replicate samples for which to compute the count differences used to estimate the factors of unwanted variation (see details).
`round`	If `TRUE`, the normalized measures are rounded to form pseudo-counts.
`epsilon`	A small constant (usually no larger than one) to be added to the counts prior to the log transformation to avoid problems with log(0).
`tolerance`	Tolerance in the selection of the number of positive singular values, i.e., a singular value must be larger than `tolerance` to be considered positive.
`isLog`	Set to `TRUE` if the input matrix is already log-transformed.

The RUVs procedure performs factor analysis on a matrix of count differences for replicate/negative control samples, for which the biological covariates of interest are constant.

Each row of scIdx should correspond to a set of replicate samples. The number of columns is the size of the largest set of replicates; rows for smaller sets are padded with -1 values.

For example, if the sets of replicate samples are (1,11,21),(2,3),(4,5),(6,7,8), then scIdx should be

1 11 21
2 3 -1
4 5 -1
6 7 8

signature(x = "matrix", cIdx = "ANY", k = "numeric", scIdx = "matrix")

It returns a list with

A samples-by-factors matrix with the estimated factors of unwanted variation (W).
The genes-by-samples matrix of normalized expression measures (possibly rounded) obtained by removing the factors of unwanted variation from the original read counts (normalizedCounts).

signature(x = "SeqExpressionSet", cIdx = "character", k="numeric", scIdx = "matrix")

It returns a SeqExpressionSet with

The normalized counts in the normalizedCounts slot.
The estimated factors of unwanted variation as additional columns of the phenoData slot.

Davide Risso (building on a previous version by Laurent Jacob).

D. Risso, J. Ngai, T. P. Speed, and S. Dudoit. Normalization of RNA-seq data using factor analysis of control genes or samples. Nature Biotechnology, 2014. (In press).

D. Risso, J. Ngai, T. P. Speed, and S. Dudoit. The role of spike-in standards in the normalization of RNA-Seq. In D. Nettleton and S. Datta, editors, Statistical Analysis of Next Generation Sequence Data. Springer, 2014. (In press).

RUVg, RUVr.

library(zebrafishRNASeq)
data(zfGenes)

## run on a subset of genesfor time reasons
## (real analyses should be performed on all genes)
genes <- rownames(zfGenes)[grep("^ENS", rownames(zfGenes))]
spikes <- rownames(zfGenes)[grep("^ERCC", rownames(zfGenes))]
set.seed(123)
idx <- c(sample(genes, 1000), spikes)
seq <- newSeqExpressionSet(as.matrix(zfGenes[idx,]))

# RUVs normalization
controls <- rownames(seq)
differences <- matrix(data=c(1:3, 4:6), byrow=TRUE, nrow=2)
seqRUVs <- RUVs(seq, controls, k=1, differences)

pData(seqRUVs)
head(normCounts(seqRUVs))