Model1: Correct for global correlations and biases In Rolexa: Statistical analysis of Solexa sequencing data

Description

Functions to correct for global correlations between color channels or between successive sequencing cycles

Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19``` ```## S4 method for signature 'SolexaIntensity' DeCorrelateChannels(int,cycles=seq(1,dim(int)[3],by=1),theta=matrix(rep(c(0.8806742,1.3727418,0.883865,1.545728),length(cycles)),ncol=4,byrow=TRUE)) ## S4 method for signature 'array' DeCorrelateChannels(int,cycles=seq(1,dim(int)[3],by=1),theta=matrix(rep(c(0.8806742,1.3727418,0.883865,1.545728),length(cycles)),ncol=4,byrow=TRUE)) DeCorrelateChannels(int,...) ## S4 method for signature 'SolexaIntensity' OptimizeAngle(int,cycles=seq(1,dim(int)[3],by=1),...) OptimizeAngle(int,...) ## S4 method for signature 'SolexaIntensity' DeCorrelateCycles(int,ncycles=dim(int)[3],rate=1.8e-2) ## S4 method for signature 'array' DeCorrelateCycles(int,ncycles=dim(int)[3],rate=1.8e-2) DeCorrelateCycles(int,...) ## S4 method for signature 'SolexaIntensity' OptimizeRate(int,ncycles=dim(int)[3],...) OptimizeRate(int,...) ## S4 method for signature 'RolexaRun' TileNormalize(run=Rolexa.env,int,cycles=seq(1,dim(int)[3],by=1)) TileNormalize(run,...) ```

Arguments

 `run` a `RolexaRun` object defining the run parameters `int` a `SolexaIntensity` object or an array `cycles, ncycles` the cycles or the number of cycles (starting from 1) to apply the correction to `theta` a `length(cycles)*4` matrix with four angles per cycle defining the coordinate changes `rate` the rate of nucleotide mis-incorporation at each cycle `...` additional arguments passed to `optim`

Details

`DeCorrelateChannels` applies to coordinate transforms: one transforming the axes 1,2 to the axes with angles `theta[,1:2]` relative to axis 1, and similarly with axes 3,4 and angles `theta[,3:4]`. These angles can be calculated with `OptimizeAngle` which minimizes the correlations between channel 1 and 2, and between channel 3 and 4, for each cycle. `DeCorrelateCycles` assumes that at each cycles, a fraction `rate` of sequences fail to incorporate any nucleotides and therefore the sequence lengths at each colony display a binomial distribution which is corrected for by taking into account the intensity measured at previous cycles. `OptimizeRate` calculates a rate that minimizes correlations between consecutives cycles.

`TileNormalize` estimates the local trend by `loess` fitting of the model `int ~ x+y` and substracts it from the intensity matrix.

Value

`TileNormalize`, `DeCorrelateChannels` and `DeCorrelateCycles` return an object of the same type as `int` corrected for spurious correlations. `OptimizeAngle` returns an `length(cycles)*4` matrix and `OptimizeRate` returns a single positive real number.

Author(s)

Jacques Rougemont, Arnaud Amzallag, Christian Iseli, Laurent Farinelli, Ioannis Xenarios, Felix Naef

References

Probabilistic base calling of Solexa sequencing data, BMC Bioinformatics 2008, 9:431

 ```1 2 3 4 5 6 7 8 9``` ```path = SolexaPath(system.file("extdata", package="ShortRead")) rolenv = SetModel(idsep="_") int = readIntensities(path,pattern="s_1_0001",withVariability=FALSE) int1 = DeCorrelateChannels(int=int,cycles=1:5,theta=OptimizeAngle(int=int,cycles=1:5)) int2 = DeCorrelateCycles(int=int1,ncycles=5,rate=OptimizeRate(int=int1)) int3 = TileNormalize(run=rolenv,int=int,cycles=1) seq = CombineReads(run=rolenv,path=path,pattern="s_1_0001_seq*") PlotCycles(run=rolenv,int=int3,seq=seq,cycles=1:4) ```