cqn: CQN (conditional quantile normalization) for RNA-Seq data
In cqn: Conditional quantile normalization

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/cqn.R

This function implements CQN (conditional quantile normalization) for RNA-Seq data.

cqn(counts, x, lengths, sizeFactors = NULL, subindex = NULL, tau = 0.5, sqn = TRUE,
    lengthMethod = c("smooth", "fixed"), verbose = FALSE)
## S3 method for class 'cqn'
print(x, ...)

`counts`	An object that can be coerced to a `matrix` of region by sample counts. Ought to have integer values.
`x`	This is a covariate whose systematic influence on the counts will be removed. Typically the GC content. Has to have the same length as the number of rows of counts.
`lengths`	The lengths (in bp) of the regions in counts. Has to have the same length as the number of rows of counts.
`sizeFactors`	An optional vector of sizeFactors, ie. the sequencing effort of the various samples. If `NULL` this is calculated as the column sums of `counts`.
`subindex`	An optional vector of indices into the rows of `counts`. If not given, this becomes the indices of genes with row means of `counts` greater then 50.
`tau`	This argument is passed to `rq`, it indicates what quantile is being fit. The default should only be changed by expert users..
`sqn`	This argument indicates whether the residuals from the systematic fit are (subset) quantile normalized. The default should only be changed by expert users.
`lengthMethod`	Should length enter the model as a smooth function or not.
`verbose`	Is the function verbose?
`...`	Not used.

These functions implement the CQN (conditional quantile normalization) for RNA-Seq data. The functions remove a single systematic effect, contained in the argument x, which will typicall be GC content. The effect of lengths will either be modelled as a smooth function (which we recommend), if you are using lengthMethod = "smooth" or as an offset (equivalent to modelling using RPKMs), if you are using lengthMethod = "fixed". Length can be complete removed from the model by having lengthMethod = "fixed" and setting all lengths to 1000.

Final corrected values are equal to value$y + value$offset.

A list with the following components

`counts`	The value of argument `counts`.
`x`	The value of argument `x`.
`lengths`	The value of argument `lengths`.
`sizeFactors`	The value of argument `sizeFactors`. In case the argument was `NULL`, this is the value used internally.
`subindex`	The value of argument `subindex`. In case the argument was `NULL`, this is the value used internally.
`y`	The dependent value used in the systematic effect fit. Equal to log2 tranformed reads per millions.
`offset`	The estimated offset.
`offset0`	A single number used internally for identifiability.
`glm.offset`	An offset useful for supplying to a GLM type model function. It is on the natural log scale and includes correcting for sizeFactors.
`func1`	The estimated effect of function 1 (argument `x`). This is a matrix of function values on a grid. Columns are samples and rows are grid points.
`grid1`	The grid points on which function 1 (argument `x`) was evaluated.
`knots1`	The knots used for function 1 (argument `x`).
`func2`	The estimated effect of function 2 (lengths). This is a matrix of function values on a grid. Columns are samples and rows are grid points.
`grid2`	The grid points on which function 2 (lengths) was evaluated.
`knots2`	The knots used for function 2 (lengths).
`call`	The call.

Internally, the function uses a custom implementation of subset quantile normalization, contained in the (not exported) SQN2 function.

Kasper Daniel Hansen, Zhijin Wu

KD Hansen, RA Irizarry, and Z Wu, Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics 2012 vol. 13(2) pp. 204-216.

The package vignette.

data(montgomery.subset)
data(sizeFactors.subset)
data(uCovar)
cqn.subset <- cqn(montgomery.subset, lengths = uCovar$length, 
                  x = uCovar$gccontent, sizeFactors = sizeFactors.subset,
                  verbose = TRUE)

Loading required package: mclust
Package 'mclust' version 5.4.2
Type 'citation("mclust")' for citing this R package in publications.
Loading required package: nor1mix
Loading required package: preprocessCore
Loading required package: splines
Loading required package: quantreg
Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

RQ fit ..........
SQN Using 'sigma' instead 'sig2' (= sigma^2) is preferred now
.