Normalize methods

Share:

Description

Normalize sequence read count data.

Usage

1
2
normalizeCounts(x, fun=mean, offset=10L, basal=1e-4, lambda=c(0.1, 0.1),
fit=FALSE, multicore=TRUE, optimizer="all", ...)

Arguments

x

Object of class TssData with raw data to normalize.

fun

Function used to average over replicates (default: mean).

offset

Integer defining the number of bases add to the ends of each segment with basal rate.

basal

Numeric specifying the basal rate.

lambda

Numeric vector of length two specifying the regulation parameter for each side of the segment.

fit

Logical whether the fitting should be performed in addition to the estimation based on the Poisson ratios obtained from all reads.

multicore

Logical whether to use the parallel package to speed up the fitting. Has only an effect if the package is available and loaded. For details, see the ‘details’ section.

optimizer

Character string choosing the optimizer for the fit (default: “all”). Possible choices are “optim” for the optim function from the stats package, “bobyqa” for the bobyqa function from the minqu package,or “all” for taking the best fit out of both.

...

Additional arguments passed for the parallel package if used. For details, see the ‘details’ section.

Details

The normalization reduces the noise by shrinking the counts towards zero. This step is intended to eliminate false positive counts as well as making further analyzes more robust by reducing the impact of large counts. Such a shrinkage or regularization procedure constitutes a well-established strategy in statistics to make predictions conservative, i.e. to reduce the number of false positive predictions.

An objective function is minimized to estimate the transcription level in a regularized manner. The log-likelihood is given by the product of the probabilities of the counts which is assumed as a Poisson distribution by default.

For \sQuote{lambda[1] > 0}, counts unequal to zero are penalized to obtain conservative estimates of the transcription levels with a preferably small number components, i.e. genomic positions, unequal to zero. The larger \sQuote{lambda[1]}, the more conservative is the identification procedure.

To enhance the shrinkage of isolated counts in comparison to counts in regions of strong transcriptional activity, the information of consecutive genomic positions in the measurements is regarded by evaluating differences between adjacent count estimates.

In order to distribute the identification step over multiple processor cores, the mclapply function of the parallel package can be used. For this, the parallel package has to be loaded manually before starting the computation, additional parameters are passed via the ... argument, e.g.as normalizeCounts(x, mc.cores=2). The multicore argument can further be used to temporarily disable the parallel estimation by setting it to FALSE.

Value

An object of class TssNorm.

Methods

Normalize read data:

normalizeCounts:

signature(x="TssData")

normalizeCounts(x, ...)

Author(s)

Maintainer: Julian Gehring <julian.gehring@fdm.uni-freiburg.de>

See Also

Classes: TssData, TssNorm, TssResult

Methods: segmentizeCounts, normalizeCounts, identifyStartSites, get-methods, plot-methods, asRangedData-methods

Functions: subtract-functions

Data set: physcoCounts

Package: TSSi-package

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## preceding steps
example(segmentizeCounts)

## normalize data, w/o and w/ fitting
yRatio <- normalizeCounts(x)
yFit <- normalizeCounts(x, fit=TRUE)

yFit

## Not run: 
## parallel computation
library(parallel)
yFit <- normalizeCounts(x, fit=TRUE, mc.ncores=2)

## End(Not run)