# normalizeCurveFit: Weighted curve-fit normalization between a pair of channels In aroma.light: Light-Weight Methods for Normalization and Visualization of Microarray Data using Only Basic R Data Types

## Description

Weighted curve-fit normalization between a pair of channels.

This method will estimate a smooth function of the dependency between the log-ratios and the log-intensity of the two channels and then correct the log-ratios (only) in order to remove the dependency. This is method is also known as intensity-dependent or lowess normalization.

The curve-fit methods are by nature limited to paired-channel data. There exist at least one method trying to overcome this limitation, namely the cyclic-lowess [1], which applies the paired curve-fit method iteratively over all pairs of channels/arrays. Cyclic-lowess is not implented here.

We recommend that affine normalization [2] is used instead of curve-fit normalization.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12``` ```## S3 method for class 'matrix' normalizeCurveFit(X, weights=NULL, typeOfWeights=c("datapoint"), method=c("loess", "lowess", "spline", "robustSpline"), bandwidth=NULL, satSignal=2^16 - 1, ...) ## S3 method for class 'matrix' normalizeLoess(X, ...) ## S3 method for class 'matrix' normalizeLowess(X, ...) ## S3 method for class 'matrix' normalizeSpline(X, ...) ## S3 method for class 'matrix' normalizeRobustSpline(X, ...) ```

## Arguments

 `X` An Nx2 `matrix` where the columns represent the two channels to be normalized. `weights` If `NULL`, non-weighted normalization is done. If data-point weights are used, this should be a `vector` of length N of data point weights used when estimating the normalization function. `typeOfWeights` A `character` string specifying the type of weights given in argument `weights`. `method` `character` string specifying which method to use when fitting the intensity-dependent function. Supported methods: `"loess"` (better than lowess), `"lowess"` (classic; supports only zero-one weights), `"spline"` (more robust than lowess at lower and upper intensities; supports only zero-one weights), `"robustSpline"` (better than spline). `bandwidth` A `double` value specifying the bandwidth of the estimator used. `satSignal` Signals equal to or above this threshold will not be used in the fitting. `...` Not used.

## Details

A smooth function c(A) is fitted throught data in (A,M), where M=log_2(y_2/y_1) and A=1/2*log_2(y_2*y_1). Data is normalized by M <- M - c(A).

Loess is by far the slowest method of the four, then lowess, and then robust spline, which iteratively calls the spline method.

## Value

A Nx2 `matrix` of the normalized two channels. The fitted model is returned as attribute `modelFit`.

## Negative, non-positive, and saturated values

Non-positive values are set to not-a-number (`NaN`). Data points that are saturated in one or more channels are not used to estimate the normalization function, but they are normalized.

## Missing values

The estimation of the normalization function will only be made based on complete non-saturated observations, i.e. observations that contains no `NA` values nor saturated values as defined by `satSignal`.

## Weighted normalization

Each data point, that is, each row in `X`, which is a vector of length 2, can be assigned a weight in [0,1] specifying how much it should affect the fitting of the normalization function. Weights are given by argument `weights`, which should be a `numeric` `vector` of length N. Regardless of weights, all data points are normalized based on the fitted normalization function.

Note that the lowess and the spline method only support zero-one {0,1} weights. For such methods, all weights that are less than a half are set to zero.

## Details on loess

For `loess`, the arguments `family="symmetric"`, `degree=1`, `span=3/4`, `control=loess.control(trace.hat="approximate"`, `iterations=5`, `surface="direct")` are used.

Henrik Bengtsson

## References

[1] M. Åstrand, Contrast Normalization of Oligonucleotide Arrays, Journal Computational Biology, 2003, 10, 95-102.
[2] Henrik Bengtsson and Ola Hössjer, Methodological Study of Affine Transformations of Gene Expression Data, Methodological study of affine transformations of gene expression data with proposed robust non-parametric multi-dimensional normalization method, BMC Bioinformatics, 2006, 7:100.

`normalizeAffine`().
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104``` ``` pathname <- system.file("data-ex", "PMT-RGData.dat", package="aroma.light") rg <- read.table(pathname, header=TRUE, sep="\t") nbrOfScans <- max(rg\$slide) rg <- as.list(rg) for (field in c("R", "G")) rg[[field]] <- matrix(as.double(rg[[field]]), ncol=nbrOfScans) rg\$slide <- rg\$spot <- NULL rg <- as.matrix(as.data.frame(rg)) colnames(rg) <- rep(c("R", "G"), each=nbrOfScans) layout(matrix(c(1,2,0,3,4,0,5,6,7), ncol=3, byrow=TRUE)) rgC <- rg for (channel in c("R", "G")) { sidx <- which(colnames(rg) == channel) channelColor <- switch(channel, R="red", G="green") # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # The raw data # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - plotMvsAPairs(rg[,sidx]) title(main=paste("Observed", channel)) box(col=channelColor) # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # The calibrated data # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - rgC[,sidx] <- calibrateMultiscan(rg[,sidx], average=NULL) plotMvsAPairs(rgC[,sidx]) title(main=paste("Calibrated", channel)) box(col=channelColor) } # for (channel ...) # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # The average calibrated data # # Note how the red signals are weaker than the green. The reason # for this can be that the scale factor in the green channel is # greater than in the red channel, but it can also be that there # is a remaining relative difference in bias between the green # and the red channel, a bias that precedes the scanning. # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - rgCA <- rg for (channel in c("R", "G")) { sidx <- which(colnames(rg) == channel) rgCA[,sidx] <- calibrateMultiscan(rg[,sidx]) } rgCAavg <- matrix(NA_real_, nrow=nrow(rgCA), ncol=2) colnames(rgCAavg) <- c("R", "G") for (channel in c("R", "G")) { sidx <- which(colnames(rg) == channel) rgCAavg[,channel] <- apply(rgCA[,sidx], MARGIN=1, FUN=median, na.rm=TRUE) } # Add some "fake" outliers outliers <- 1:600 rgCAavg[outliers,"G"] <- 50000 plotMvsA(rgCAavg) title(main="Average calibrated (AC)") # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Normalize data # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Weight-down outliers when normalizing weights <- rep(1, nrow(rgCAavg)) weights[outliers] <- 0.001 # Affine normalization of channels rgCANa <- normalizeAffine(rgCAavg, weights=weights) # It is always ok to rescale the affine normalized data if its # done on (R,G); not on (A,M)! However, this is only needed for # esthetic purposes. rgCANa <- rgCANa *2^1.4 plotMvsA(rgCANa) title(main="Normalized AC") # Curve-fit (lowess) normalization rgCANlw <- normalizeLowess(rgCAavg, weights=weights) plotMvsA(rgCANlw, col="orange", add=TRUE) # Curve-fit (loess) normalization rgCANl <- normalizeLoess(rgCAavg, weights=weights) plotMvsA(rgCANl, col="red", add=TRUE) # Curve-fit (robust spline) normalization rgCANrs <- normalizeRobustSpline(rgCAavg, weights=weights) plotMvsA(rgCANrs, col="blue", add=TRUE) legend(x=0,y=16, legend=c("affine", "lowess", "loess", "r. spline"), pch=19, col=c("black", "orange", "red", "blue"), ncol=2, x.intersp=0.3, bty="n") plotMvsMPairs(cbind(rgCANa, rgCANlw), col="orange", xlab=expression(M[affine])) title(main="Normalized AC") plotMvsMPairs(cbind(rgCANa, rgCANl), col="red", add=TRUE) plotMvsMPairs(cbind(rgCANa, rgCANrs), col="blue", add=TRUE) abline(a=0, b=1, lty=2) legend(x=-6,y=6, legend=c("lowess", "loess", "r. spline"), pch=19, col=c("orange", "red", "blue"), ncol=2, x.intersp=0.3, bty="n") ```