Normalizes the empirical distribution of one or more samples to a target distribution

Share:

Description

Normalizes the empirical distribution of one or more samples to a target distribution. After normalization, all samples have the same average empirical density distribution.

Usage

1
2
3
4
5
6
## S3 method for class 'numeric'
normalizeQuantileSpline(x, w=NULL, xTarget, sortTarget=TRUE, robust=TRUE, ...)
## S3 method for class 'matrix'
normalizeQuantileSpline(X, w=NULL, xTarget=NULL, sortTarget=TRUE, robust=TRUE, ...)
## S3 method for class 'list'
normalizeQuantileSpline(X, w=NULL, xTarget=NULL, sortTarget=TRUE, robust=TRUE, ...)

Arguments

x, X

A single (K=1) numeric vector of length N, a numeric NxK matrix, or a list of length K with numeric vectors, where K represents the number of samples and N the number of data points.

w

An optional numeric vector of length N of weights specific to each data point.

xTarget

The target empirical distribution as a sorted numeric vector of length M. If NULL and X is a list, then the target distribution is calculated as the average empirical distribution of the samples.

sortTarget

If TRUE, argument xTarget will be sorted, otherwise it is assumed to be already sorted.

robust

If TRUE, the normalization function is estimated robustly.

...

Arguments passed to (smooth.spline or robustSmoothSpline).

Value

Returns an object of the same type and dimensions as the input.

Missing values

Both argument X and xTarget may contain non-finite values. These values do not affect the estimation of the normalization function. Missing values and other non-finite values in X, remain in the output as is. No new missing values are introduced.

Author(s)

Henrik Bengtsson

References

[1] H. Bengtsson, R. Irizarry, B. Carvalho, and T. Speed, Estimation and assessment of raw copy numbers at the single locus level, Bioinformatics, 2008.

See Also

The target distribution can be calculated as the average using averageQuantile().

Internally either robustSmoothSpline (robust=TRUE) or smooth.spline (robust=FALSE) is used.

An alternative normalization method that is also normalizing the empirical densities of samples is normalizeQuantileRank(). Contrary to this method, that method requires that all samples are based on the exact same set of data points and it is also more likely to over-correct in the tails of the distributions.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Simulate three samples with on average 20% missing values
N <- 10000
X <- cbind(rnorm(N, mean=3, sd=1),
           rnorm(N, mean=4, sd=2),
           rgamma(N, shape=2, rate=1))
X[sample(3*N, size=0.20*3*N)] <- NA

# Plot the data
layout(matrix(c(1,0,2:5), ncol=2, byrow=TRUE))
xlim <- range(X, na.rm=TRUE);
plotDensity(X, lwd=2, xlim=xlim, main="The three original distributions")

Xn <- normalizeQuantile(X)
plotDensity(Xn, lwd=2, xlim=xlim, main="The three normalized distributions")
plotXYCurve(X, Xn, xlim=xlim, main="The three normalized distributions")

Xn2 <- normalizeQuantileSpline(X, xTarget=Xn[,1], spar=0.99)
plotDensity(Xn2, lwd=2, xlim=xlim, main="The three normalized distributions")
plotXYCurve(X, Xn2, xlim=xlim, main="The three normalized distributions")

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.