normR: Enrichment, Difference and Regime Calling in ChIP-seq data.
In imbbLab/normr: Normalization and difference calling in ChIP-seq data

Description Details Author(s) See Also Examples

A correct background estimation is crucial for calling enrichment and differences in ChIP-seq data. normR provides robust normalization and difference calling in ChIP-seq and alike data. In brief, a binomial mixture model with a given number of components is fit to read count data for a treatment and control experiment. Therein, computational performance is improved by fitting a log-space model via Expectation Maximization in C++. Convergence is achieved by a threshold on the minimum change in model loglikelihood. After the model fit has converged, a robust background estimate is obtained. This estimate accounts for the effect of enrichment in certain regions and, therefore, represents an appropriate null hypothesis. This robust background is used to identify significantly enriched or depleted regions with respect to control. Moreover, a standardized enrichment for each bin is calculated based on the fitted background component. For convenience, read count vectors can be obtained directly from bam files when a compliant chromosome annotation is given. Please refer to the individual documentations of functions for enrichment calling (enrichR), difference calling (diffR) and enrichment regime calling (regimeR).

Available functions are

enrichR: Enrichment calling between treatment (e.g. ChIP-seq) and control (e.g. Input).

diffR: Difference calling between treatment (e.g. ChIP-seq condition 1) and control (e.g. ChIP-seq condition 2).

regimeR: Enrichment regime calling between treatment (e.g. ChIP-seq) and control (e.g. Input) with a given number of model components. For example, 3 regimes recover background, broad and peak enrichment.

The computational performance is improved by fitting a log-space model in C++. Parallization is achieved in C++ via OpenMP (http://openmp.org).

Johannes Helmuth helmuth@molgen.mpg.de

NormRFit-class for functions on accessing and exporting the normR fit. NormRCountConfig-class for configuration of the read counting procedure (binsize, mapping quality,...).

require(GenomicRanges)

### enrichR(): Calling Enrichment over Input
#load some example bamfiles
input <- system.file("extdata", "K562_Input.bam", package="normr")
chipK4 <- system.file("extdata", "K562_H3K4me3.bam", package="normr")
#region to count in (example files contain information only in this region)
gr <- GRanges("chr1", IRanges(seq(22500001, 25000000, 1000), width = 1000))
#configure your counting strategy (see BamCountConfig-class)
countConfiguration <- countConfigSingleEnd(binsize = 1000,
                                           mapq = 30, shift = 100)
#invoke enrichR to call enrichment
enrich <- enrichR(treatment = chipK4, control = input,
                  genome = gr,  countConfig = countConfiguration,
                  iterations = 10, procs = 1, verbose = TRUE)
#inspect the fit
enrich
summary(enrich)

## Not run:
#write significant regions to bed
#exportR(enrich, filename = "enrich.bed", fdr = 0.01)
#write normalized enrichment to bigWig
#exportR(enrich, filename = "enrich.bw")
## End(**Not run**)

### diffR(): Calling differences between two conditions
chipK36 <- system.file("extdata", "K562_H3K36me3.bam", package="normr")
diff <- diffR(treatment = chipK36, control = chipK4,
              genome = gr,  countConfig = countConfiguration,
              iterations = 10, procs = 1, verbose = TRUE)
summary(diff)

### regimeR(): Identification of broad and peak enrichment
regime <- regimeR(treatment = chipK36, control = input, models = 3,
                  genome = gr,  countConfig = countConfiguration,
                  iterations = 10, procs = 1, verbose = TRUE)
summary(regime)