Description Classical background subtraction Multiscan calibration Affine normalization Linear (proportional) normalization Author(s)

In this section we give *our* recommendation on how spotted
two-color (or multi-color) microarray data is best calibrated and
normalized.

We do *not* recommend background subtraction in classical
means where background is estimated by various image analysis
methods. This means that we will only consider foreground signals
in the analysis.

We estimate "background" by other means. In what is explain below, only a global background, that is, a global bias, is estimated and removed.

In Bengtsson et al (2004) we give evidence that microarray scanners
can introduce a significant bias in data. This bias, which is
about 15-25 out of 65535, *will* introduce intensity dependency
in the log-ratios, as explained in Bengtsson &
HÃ¶ssjer (2006).

In Bengtsson et al (2004) we find that this bias is stable across arrays (and a couple of months), but further research is needed in order to tell if this is true over a longer time period.

To calibrate signals for scanner biases, scan the same array at
multiple PMT-settings at three or more (K >= 3) different
PMT settings (preferably in decreasing order).
While doing this, *do not adjust the laser power settings*.
Also, do the multiscan *without* washing, cleaning or by other
means changing the array between subsequent scans.
Although not necessary, it is preferred that the array
remains in the scanner between subsequent scans. This will simplify
the image analysis since spot identification can be made once
if images aligns perfectly.

After image analysis, read all K scans for the same array into the two matrices, one for the red and one for the green channel, where the K columns corresponds to scans and the N rows to the spots. It is enough to use foreground signals.

In order to multiscan calibrate the data, for each channel
separately call `Xc <- calibrateMultiscan(X)`

where `X`

is the NxK matrix of signals for one channel across all scans. The
calibrated signals are returned in the Nx1 matrix `Xc`

.

Multiscan calibration may sometimes be skipped, especially if affine
normalization is applied immediately after, but we do recommend that
every lab check at least once if their scanner introduce bias.
If the offsets in a scanner is already estimated from earlier
multiscan analyses, or known by other means, they can readily be
subtracted from the signals of each channel. If arrays are still
multiscanned, it is possible to force the calibration method to
fit the model with zero intercept (assuming the scanner offsets
have been subtracted) by adding argument `center=FALSE`

.

In Bengtsson & HÃ¶ssjer (2006), we carry out a detailed
study on how biases in each channel introduce so called
intensity-dependent log-ratios among other systematic artifacts.
Data with (additive) bias in each channel is said to be *affinely*
transformed. Data without such bias, is said to be *linearly*
(proportionally) transform. Ideally, observed signals (data) is a
linear (proportional) function of true gene expression levels.

We do *not* assume proportional observations. The scanner bias
is real evidence that assuming linearity is not correct.
Affine normalization corrects for affine transformation in data.
Without control spots it is not possible to estimate the bias in each
of the channels but only the relative bias such that after
normalization the effective bias are the same in all channels.
This is why we call it normalization and not calibration.

In its simplest form, affine normalization is done by
`Xn <- normalizeAffine(X)`

where `X`

is a Nx2 matrix with
the first column holds the foreground signals from the red channel and
the second holds the signals from the green channel. If three- or
four-channel data is used these are added the same way. The normalized
data is returned as a Nx2 matrix `Xn`

.

To normalize all arrays and all channels at once, one may put all
data into one big NxK matrix where the K columns hold the all channels
from the first array, then all channels from the second array and so
on. Then `Xn <- normalizeAffine(X)`

will return the across-array
and across-channel normalized data in the NxK matrix `Xn`

where
the columns are stored in the same order as in matrix `X`

.

Equal effective bias in all channels is much better. First of all,
any intensity-dependent bias in the log-ratios is removed *for
all non-differentially expressed genes*. There is still an
intensity-dependent bias in the log-ratios for differentially expressed
genes, but this is now symmetric around log-ratio zero.

Affine normalization will (by default and recommended) normalize
*all* arrays together and at once. This will guarantee that
all arrays are "on the same scale". Thus, it *not* recommended
to apply a classical between-array scale normalization afterward.
Moreover, the average log-ratio will be zero after an affine
normalization.

Note that an affine normalization will only remove curvature in the log-ratios at lower intensities. If a strong intensity-dependent bias at high intensities remains, this is most likely due to saturation effects, such as too high PMT settings or quenching.

Note that for a perfect affine normalization you *should*
expect much higher noise levels in the *log-ratios* at lower
intensities than at higher. It should also be approximately
symmetric around zero log-ratio.
In other words, *a strong fanning effect is a good sign*.

Due to different noise levels in red and green channels, different PMT settings in different channels, plus the fact that the minimum signal is zero, "odd shapes" may be seen in the log-ratio vs log-intensity graphs at lower intensities. Typically, these show themselves as non-symmetric in positive and negative log-ratios. Note that you should not see this at higher intensities.

If there is a strong intensity-dependent effect left after the affine normalization, we recommend, for now, that a subsequent curve-fit or quantile normalization is done. Which one, we do not know.

Why negative signals?
By default, 5% of the normalized signals will have a non-positive
signal in one or both channels. *This is on purpose*, although
the exact number 5% is chosen by experience. The reason for
introducing negative signals is that they are indeed expected.
For instance, when measure a zero gene expression level, there is
a chance that the observed value is (should be) negative due to
measurement noise. (For this reason it is possible that the scanner
manufacturers have introduced scanner bias on purpose to avoid
negative signals, which then all would be truncated to zero.)
To adjust the ratio (or number) of negative signals allowed, use
for example `normalizeAffine(X, constraint=0.01)`

for 1%
negative signals. If set to zero (or `"max"`

) only as much
bias is removed such that no negative signals exist afterward.
Note that this is also true if there were negative signals on
beforehand.

Why not lowess normalization? Curve-fit normalization methods such as lowess normalization are basically designed based on linearity assumptions and will for this reason not correct for channel biases. Curve-fit normalization methods can by definition only be applied to one pair of channels at the time and do therefore require a subsequent between-array scale normalization, which is by the way very ad hoc.

Why not quantile normalization? Affine normalization can be though of a special case of quantile normalization that is more robust than the latter. See Bengtsson & HÃ¶ssjer (2006) for details. Quantile normalization is probably better to apply than curve-fit normalization methods, but less robust than affine normalization, especially at extreme (low and high) intensities. For this reason, we do recommend to use affine normalization first, and if this is not satisfactory, quantile normalization may be applied.

If the channel offsets are zero, already corrected for, or estimated
by other means, it is possible to normalize the data robustly by
fitting the above affine model without intercept, that is, fitting
a truly linear model. This is done adding argument `center=FALSE`

when calling `normalizeAffine()`

.

Henrik Bengtsson

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.