A correct background estimation is crucial for calling enrichment and
differences in ChIP-seq data.
normR provides robust
normalization and difference calling in ChIP-seq and alike data. In brief, a
binomial mixture model with a given number of components is fit to read
count data for a treatment and control experiment. Therein, computational
performance is improved by fitting a log-space model via Expectation
Maximization in C++. Convergence is achieved by a threshold on
the minimum change in model loglikelihood. After the model fit has
converged, a robust background estimate is obtained. This estimate accounts
for the effect of enrichment in certain regions and, therefore, represents
an appropriate null hypothesis. This robust background is used to identify
significantly enriched or depleted regions with respect to control.
Moreover, a standardized enrichment for each bin is calculated based on the
fitted background component. For convenience, read count vectors can be
obtained directly from bam files when a compliant chromosome annotation is
given. Please refer to the individual documentations of functions for
enrichment calling (
enrichR), difference calling
diffR) and enrichment regime calling (
Available functions are
enrichR: Enrichment calling between
(e.g. ChIP-seq) and
control (e.g. Input).
diffR: Difference calling between
(e.g. ChIP-seq condition 1) and
control (e.g. ChIP-seq
regimeR: Enrichment regime calling between
(e.g. ChIP-seq) and
control (e.g. Input) with a
given number of model components. For example, 3 regimes recover background,
broad and peak enrichment.
The computational performance is improved by fitting a log-space model in C++. Parallization is achieved in C++ via OpenMP (http://openmp.org).
Johannes Helmuth [email protected]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
require(GenomicRanges) ### enrichR(): Calling Enrichment over Input #load some example bamfiles input <- system.file("extdata", "K562_Input.bam", package="normr") chipK4 <- system.file("extdata", "K562_H3K4me3.bam", package="normr") #region to count in (example files contain information only in this region) gr <- GRanges("chr1", IRanges(seq(22500001, 25000000, 1000), width = 1000)) #configure your counting strategy (see BamCountConfig-class) countConfiguration <- countConfigSingleEnd(binsize = 1000, mapq = 30, shift = 100) #invoke enrichR to call enrichment enrich <- enrichR(treatment = chipK4, control = input, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) #inspect the fit enrich summary(enrich) ## Not run: #write significant regions to bed #exportR(enrich, filename = "enrich.bed", fdr = 0.01) #write normalized enrichment to bigWig #exportR(enrich, filename = "enrich.bw") ## End(**Not run**) ### diffR(): Calling differences between two conditions chipK36 <- system.file("extdata", "K562_H3K36me3.bam", package="normr") diff <- diffR(treatment = chipK36, control = chipK4, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(diff) ### regimeR(): Identification of broad and peak enrichment regime <- regimeR(treatment = chipK36, control = input, models = 3, genome = gr, countConfig = countConfiguration, iterations = 10, procs = 1, verbose = TRUE) summary(regime)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.