RUVg-methods | R Documentation |
This function implements the RUVg method of Risso et al. (2014).
RUVg(x, cIdx, k, drop=0, center=TRUE, round=TRUE, epsilon=1, tolerance=1e-8, isLog=FALSE)
x |
Either a genes-by-samples numeric matrix or a SeqExpressionSet object containing the read counts. |
cIdx |
A character, logical, or numeric vector indicating the subset of genes to be used as negative controls in the estimation of the factors of unwanted variation. |
k |
The number of factors of unwanted variation to be estimated from the data. |
drop |
The number of singular values to drop in the estimation of the factors
of unwanted variation. This number is usually zero, but might be set to
one if the first singular value captures the effect of interest. It
must be less than |
center |
If |
round |
If |
epsilon |
A small constant (usually no larger than one) to be added to the counts prior to the log transformation to avoid problems with log(0). |
tolerance |
Tolerance in the selection of the number of positive singular values, i.e., a singular value must be larger than |
isLog |
Set to |
The RUVg procedure performs factor analysis of the read counts based on a suitably-chosen subset of negative control genes known a priori not be differentially expressed (DE) between the samples under consideration.
Several types of controls can be used, including housekeeping genes, spike-in sequences (e.g., ERCC), or “in-silico” empirical controls (e.g., least significantly DE genes based on a DE analysis performed prior to RUV normalization).
Note that one can relax the negative control gene assumption by requiring instead the identification of a set of positive or negative controls, with a priori known expression fold-changes between samples. RUVg can then simply be applied to control-centered log counts, as detailed in the vignette.
signature(x = "matrix", cIdx = "ANY", k = "numeric")
It returns a list with
A samples-by-factors matrix with the estimated factors of unwanted variation (W
).
The genes-by-samples matrix of normalized expression measures (possibly
rounded) obtained by removing the factors of unwanted variation from the
original read counts (normalizedCounts
).
signature(x = "SeqExpressionSet", cIdx = "character", k="numeric")
It returns a SeqExpressionSet with
The normalized counts in the normalizedCounts
slot.
The estimated factors of unwanted variation as additional columns of the
phenoData
slot.
Davide Risso
D. Risso, J. Ngai, T. P. Speed, and S. Dudoit. Normalization of RNA-seq data using factor analysis of control genes or samples. Nature Biotechnology, 2014. (In press).
D. Risso, J. Ngai, T. P. Speed, and S. Dudoit. The role of spike-in standards in the normalization of RNA-Seq. In D. Nettleton and S. Datta, editors, Statistical Analysis of Next Generation Sequence Data. Springer, 2014. (In press).
RUVr
, RUVs
.
library(zebrafishRNASeq) data(zfGenes) ## run on a subset of genes for time reasons ## (real analyses should be performed on all genes) genes <- rownames(zfGenes)[grep("^ENS", rownames(zfGenes))] spikes <- rownames(zfGenes)[grep("^ERCC", rownames(zfGenes))] set.seed(123) idx <- c(sample(genes, 1000), spikes) seq <- newSeqExpressionSet(as.matrix(zfGenes[idx,])) # RUVg normalization seqRUVg <- RUVg(seq, spikes, k=1) pData(seqRUVg) head(normCounts(seqRUVg)) plotRLE(seq, outline=FALSE, ylim=c(-3, 3)) plotRLE(seqRUVg, outline=FALSE, ylim=c(-3, 3)) barplot(as.matrix(pData(seqRUVg)), beside=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.