normalizeQuantileSpline: Normalizes the empirical distribution of one or more samples...
In aroma.light: Light-Weight Methods for Normalization and Visualization of Microarray Data using Only Basic R Data Types

Description Usage Arguments Value Missing values Author(s) References See Also Examples

Normalizes the empirical distribution of one or more samples to a target distribution. After normalization, all samples have the same average empirical density distribution.

## S3 method for class 'numeric'
normalizeQuantileSpline(x, w=NULL, xTarget, sortTarget=TRUE, robust=TRUE, ...)
## S3 method for class 'matrix'
normalizeQuantileSpline(X, w=NULL, xTarget=NULL, sortTarget=TRUE, robust=TRUE, ...)
## S3 method for class 'list'
normalizeQuantileSpline(X, w=NULL, xTarget=NULL, sortTarget=TRUE, robust=TRUE, ...)

`x, X`	A single (K=1) `numeric` `vector` of length N, a `numeric` NxK `matrix`, or a `list` of length K with `numeric` `vector`s, where K represents the number of samples and N the number of data points.
`w`	An optional `numeric` `vector` of length N of weights specific to each data point.
`xTarget`	The target empirical distribution as a sorted `numeric` `vector` of length M. If `NULL` and `X` is a `list`, then the target distribution is calculated as the average empirical distribution of the samples.
`sortTarget`	If `TRUE`, argument `xTarget` will be sorted, otherwise it is assumed to be already sorted.
`robust`	If `TRUE`, the normalization function is estimated robustly.
`...`	Arguments passed to (`smooth.spline` or `robustSmoothSpline`).

Returns an object of the same type and dimensions as the input.

Both argument X and xTarget may contain non-finite values. These values do not affect the estimation of the normalization function. Missing values and other non-finite values in X, remain in the output as is. No new missing values are introduced.

Henrik Bengtsson

[1] H. Bengtsson, R. Irizarry, B. Carvalho, and T. Speed, Estimation and assessment of raw copy numbers at the single locus level, Bioinformatics, 2008.

The target distribution can be calculated as the average using averageQuantile().

Internally either robustSmoothSpline (robust=TRUE) or smooth.spline (robust=FALSE) is used.

An alternative normalization method that is also normalizing the empirical densities of samples is normalizeQuantileRank(). Contrary to this method, that method requires that all samples are based on the exact same set of data points and it is also more likely to over-correct in the tails of the distributions.

# Simulate three samples with on average 20% missing values
N <- 10000
X <- cbind(rnorm(N, mean=3, sd=1),
           rnorm(N, mean=4, sd=2),
           rgamma(N, shape=2, rate=1))
X[sample(3*N, size=0.20*3*N)] <- NA

# Plot the data
layout(matrix(c(1,0,2:5), ncol=2, byrow=TRUE))
xlim <- range(X, na.rm=TRUE)
plotDensity(X, lwd=2, xlim=xlim, main="The three original distributions")

Xn <- normalizeQuantile(X)
plotDensity(Xn, lwd=2, xlim=xlim, main="The three normalized distributions")
plotXYCurve(X, Xn, xlim=xlim, main="The three normalized distributions")

Xn2 <- normalizeQuantileSpline(X, xTarget=Xn[,1], spar=0.99)
plotDensity(Xn2, lwd=2, xlim=xlim, main="The three normalized distributions")
plotXYCurve(X, Xn2, xlim=xlim, main="The three normalized distributions")