preprocessBeadSet: Pre-processing of BeadSetIllumina objects
In beadarrayMSV: Analysis of Illumina BeadArray SNP data including MSV markers

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Performs a sequence of pre-processing routines on objects of class "BeadSetIllumina"

setNormOptions(shearInf1 = TRUE, transf = "root",
    method = "medianAF",
    minSize = suggestSh(shearInf1)$minSize,
    prob = suggestSh(shearInf1)$prob,
    nBins = suggestSh(shearInf1)$nBins,
    dist = suggestTr(transf)$dist,
    pNorm = suggestTr(transf)$pNorm,
    nthRoot = suggestTr(transf)$nthRoot,
    offset = suggestTr(transf)$offset,
    scale = suggestNo(method)$scale,
    nSD = 3, breaks = 200)

plotPreprocessing(BSData, normInd,
    normOpts = setNormOptions(shearInf1 = !is.null(normInd)),
    plotArray = 1, ...)

preprocessBeadSet(BSData, normInd,
    normOpts = setNormOptions(shearInf1 = !is.null(normInd)))

`shearInf1`	If `TRUE`, only the signal-containing channel of Infinium I beads are used to define the homozygote asymptotes for the affine transformation (rotation and shearing). This may be more accurate than using all beads, as the variation along the perpendicular axis is small
`transf`	Character string denoting transformation. One of “none”, “log” (base 2), or “root” (defined by `nthRoot`)
`method`	Character string denoting channel normalization method for each array. One of “none”, “quantNorm”, “medianAF”, or “linPeak”. For quantile normalization, the limma package is required (Smyth and Speed, 2003). For “medianAF”, the red channel is scaled such that `median(R/(R+G))` is close to one half. If “linPeak” is chosen, both channels are linearly scaled by its `scale`'th quantile
`minSize`	The homozygote asymptotes are found by drawing a straight line through quantile points distributed in bins along each axis. Only bins containing more than `minSize` points are used
`prob`	Numeric probabiliy used in the `quantile`-function, defining the points through which the asymptotes are drawn
`nBins`	The number of bins into which to divide the points along each axis before the homozygote asymptotes are drawn
`dist`	Character string defining the distance measure used for polar coordinates transformation of the signal. One of “manhattan”, “euclidean”, or “minkowski”. See `cart2pol`
`pNorm`	See `cart2pol`
`nthRoot`	Numeric used together with `transf="root"`
`offset`	A numeric offset added to each channel before transformation. Values below zero are set to `NA` during log- or root-transformation
`scale`	Used with `method="linPeak"`
`nSD`	The background signal is estimated as `nSD` times the estimated standard measurement error (found from the the parameterised noise levels for each channel)
`breaks`	The parameterisation of noise levels is based on a histogram of each channel, where the numeric `breaks` defines the smoothing (number of bins). See `hist`
`BSData`	`"BeadSetIllumina"` object not previously pre-processed
`normInd`	Matrix with logical indexes to sub-bead pool for each bead-type. See `getNormInd`
`normOpts`	List output from `setNormOptions`
`...`	Further arguments to `plotEstimatedNoise`
`plotArray`	Numeric index to a single array to plot

Using setNormOptions, default pre-processing options are suggested, and any changes may be specified. The effects of different options are studied using plotPreprocessing for a number of arbitrary arrays. This produces four plots; i) raw data scatter, ii) scatter including the estimated asymptotes for the affine transformation (red/green) including the quantile points used (blue dots), iii) the noise levels for the red and green channel after transformation, parameterized signal superimposed, based on the non-signal channels of Infinium I beads, and iv) scatter after transformation including new axes (green) and estimated noise levels (red dots).

For the affine transformation, it is important that enough quantile points are included to get reliable asymptotes. If there are few blue dots in plot ii), decrease the minSize option or set shearInf1 to FALSE. If the grey lines in plot iii) are too coarse (too few points) to get a good noise-parameterisation, increase breaks. Note also how the noise levels are affected by different transformations.

Pay close regard to how the transformation affects the shapes of the clouds in plot iv). Ideally, three well defined clouds protrude from the estimated origin, corresponding to the homozygotes which fall on the estimated axes and the heterozygotes which fall 45 degrees in between. Imagine a rubber band stretched over the ends of the three clouds. If the rubber band is straight (no transformation), the “manhattan” (or 1-norm “minkowski”) distance is the best option for polar coordinates. If the three points fall on a circle, the “euclidean” (or 2-norm “minkowski”) distance is the best option. If the rubber band forms a shape intermediate between a circle and a square (e.g. 4th-root transformation), the 5-norm “minkowski” distance or similar may the best choice.

The function preprocessBeadSet calls several pre-processing routines in sequence. First shearRawSignal performs the affine transformations, then getNoiseDistributions estimates the distributions of the noise for each channel. Next, transformChannels transforms the signal, followed by transformation of the standard errors of each channel using transformSEs. In the end, normalizeShearedChannels performs channel normalisation for each array.

Output from setNormOptions is a list with pre-processing options

The function plotPreprocessing is used for its side effects

Output from preprocessIllumina is a "BeadSetIllumina" object with pre-processed assayData entries. A column “noiseIntensity” is added to phenoData, this is the (parameterized) standard error times nSD

If BSData contains a phenoData column “noiseIntensity”, preprocessBeadSet assumes the data are already normalized and an error is produced

Lars Gidskehaug

G. K. Smyth and T. P. Speed. (2003) Normalization of cDNA microarray data. Methods 31:265-27

readBeadSummaryOutput, getNormInd, shearRawSignal, getNoiseDistributions, transformChannels, transformSEs, normalizeShearedChannels, createAlleleSet, BeadSetIllumina

## Not run: 
#Read files into BeadSetIllumina-object
rPath <- system.file("extdata", package="beadarrayMSV")
BSDataRaw <- readBeadSummaryOutput(path=rPath,recursive=TRUE)

#Find indexes to sub-bead pools
beadInfo <- read.table(paste(rPath,'beadData.txt',sep='/'),sep='\t',
    header=TRUE,as.is=TRUE)
rownames(beadInfo) <- make.names(beadInfo$Name)
normInd <- getNormInd(beadInfo,featureNames(BSDataRaw))

#Pre-process
normOpts <- setNormOptions(minSize=10)
plotPreprocessing(BSDataRaw,normInd,normOpts,plotArray=1)
BSData <- preprocessBeadSet(BSDataRaw,normInd,normOpts)
pData(BSData)

## End(Not run)