normR-regimeR: Regime Enrichment Calling for ChIP-seq data in normR with...

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Regime enrichment calling between treatment (ChIP-seq) and control (Input) in normR is done by fitting background and multiple enrichment regimes simultaenously. Therefore, a mixture of models binomials is fit to the data with Expectation Maximization (EM). After convergence of the EM, the fitted background component is used to calculate significance for treatment and control count pair. Based on this statistic, user can extract significantly enriched regions with a desired significance level. Regime assignments are done by Maximum A Posteriori. Regions can be further analyzed within R or exported (see NormRFit-class). Furthermore, regimeR calculates a standardized enrichment given the fitted background component. For example, 3 regimes discriminate background, broad and peak enrichment. See also Details.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
regimeR(treatment, control, genome, models, ...)

## S4 method for signature 'integer,integer,GenomicRanges,numeric'
regimeR(treatment, control,
  genome, models = 3, procs = 1L, verbose = TRUE, eps = 1e-05,
  iterations = 10, minP = 0.05)

## S4 method for signature 'character,character,GenomicRanges,numeric'
regimeR(treatment,
  control, genome, models = 3, countConfig = countConfigSingleEnd(),
  procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10,
  minP = 0.05)

## S4 method for signature 'character,character,data.frame,numeric'
regimeR(treatment, control,
  genome, models = 3, countConfig = countConfigSingleEnd(), procs = 1L,
  verbose = TRUE, eps = 1e-05, iterations = 10, minP = 0.05)

## S4 method for signature 'character,character,character,numeric'
regimeR(treatment, control,
  genome = "", models = 3, countConfig = countConfigSingleEnd(),
  procs = 1L, verbose = TRUE, eps = 1e-05, iterations = 10,
  minP = 0.05)

Arguments

treatment

An integer vector of treatment counts or a character pointing to the treatment bam file. In the latter case an "treatment.bai" index file should exist in the same folder.

control

An integer vector of control counts or a character pointing to the control bam file. In the latter case an "control.bai" index file should exist in the same folder.

genome

Either NULL (default), a character specifying a USCS genome identifier, a data.frame consisting of two columns or a GenomicRanges specifying the genomic regions (see Details).

models

An integer specifying the number of mixture components to fit [regimeR only]. Default is 3.

...

Optional arguments for the respective implementations of regimeR.

procs

An integer giving the number of parallel threads to use.

verbose

A logical indicating whether verbose output is desired.

eps

A numeric specifying the T Filter threshold and the threshold for EM convergence, i.e. the minimal difference in log-likelihood in two consecutive steps.

iterations

An integer specifying how many times the EM is initialized with random model parameters.

minP

An integer controlling the threshold for the T method when filtering low power regions, i.e. regions with low counts.

countConfig

A NormRCountConfig object specifying bam counting parameters for read count retrieval. See Details.

Details

Supplied count vectors for treatment and control should be of same length and of type integer.

For convenience, read count vectors can be obtained directly from bam files. In this case, please specify a bam file for treatment and control each and a genome. Bam files should be indexed using samtools (i.e. samtools index file file.bai). Furthermore, bam files should contain a valid header with given chromosome names. If genome == NULL(default), chromosome names will be read from treatment bamheader. Please be aware that bamheader might contain irregular contigs and chrM which influence the fit. Also be sure that treatment and control contain the same chromosomes. Otherwise an error will be thrown. If genome is a character, fetchExtendedChromInfoFromUCSC is used to resolve this to a valid UCSC genome identifier (see https://genome.ucsc.edu/cgi-bin/hgGateway for available genomes). In this case, only assembled molecules will be considered (no circular). Please check if your bam files obey this annotation. If genome is a data.frame, it represents the chromosome specification. The first column will be used as chromosome ID and the second column will be used as the chromosome lengths. If genome is a GenomicRanges, it should contain the equally sized genomic loci to count in, e.g. promoters. The binsize in the supplied NormRCountConfig is ignore in this case.

bamCountConfig is an instance of class NormRCountConfig specifying settings for read counting on bam files. You can specify the binsize, minimum mapping quality, shifting of read ends etc.. Please refer to NormRFit-class for details.

Value

A NormRFit container holding results of the fit with type regimeR.

Author(s)

Johannes Helmuth helmuth@molgen.mpg.de

See Also

NormRFit-class for functions on accessing and exporting the regimeR fit. NormRCountConfig-class for configuration of the read counting procedure (binsize, mapping quality,...).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
require(GenomicRanges)

### enrichR(): Calling Enrichment over Input
#load some example bamfiles
input <- system.file("extdata", "K562_Input.bam", package="normr")
chipK4 <- system.file("extdata", "K562_H3K4me3.bam", package="normr")
#region to count in (example files contain information only in this region)
gr <- GRanges("chr1", IRanges(seq(22500001, 25000000, 1000), width = 1000))
#configure your counting strategy (see BamCountConfig-class)
countConfiguration <- countConfigSingleEnd(binsize = 1000,
                                           mapq = 30, shift = 100)
#invoke enrichR to call enrichment
enrich <- enrichR(treatment = chipK4, control = input,
                  genome = gr,  countConfig = countConfiguration,
                  iterations = 10, procs = 1, verbose = TRUE)
#inspect the fit
enrich
summary(enrich)

## Not run:
#write significant regions to bed
#exportR(enrich, filename = "enrich.bed", fdr = 0.01)
#write normalized enrichment to bigWig
#exportR(enrich, filename = "enrich.bw")
## End(**Not run**)

### diffR(): Calling differences between two conditions
chipK36 <- system.file("extdata", "K562_H3K36me3.bam", package="normr")
diff <- diffR(treatment = chipK36, control = chipK4,
              genome = gr,  countConfig = countConfiguration,
              iterations = 10, procs = 1, verbose = TRUE)
summary(diff)

### regimeR(): Identification of broad and peak enrichment
regime <- regimeR(treatment = chipK36, control = input, models = 3,
                  genome = gr,  countConfig = countConfiguration,
                  iterations = 10, procs = 1, verbose = TRUE)
summary(regime)

imbbLab/normr documentation built on Jan. 10, 2021, 12:06 a.m.