exomePeakCalling-methods: Method exomePeakCalling

Description Usage Arguments Details Value See Also Examples

Description

exomePeakCalling call peaks of RNA modification from a MeRIP-seq data set.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
exomePeakCalling(
  merip_bams = NULL,
  txdb = NULL,
  bsgenome = NULL,
  genome = NA,
  mod_annot = NULL,
  glm_type = c("DESeq2", "NB", "Poisson"),
  background_method = c("Gaussian_mixture", "m6Aseq_prior", "manual", "all"),
  manual_background = NULL,
  correct_GC_bg = TRUE,
  qtnorm = FALSE,
  gff_dir = NULL,
  fragment_length = 100,
  binding_length = 25,
  step_length = binding_length,
  peak_width = fragment_length/2,
  pc_count_cutoff = 5,
  bg_count_cutoff = 50,
  p_cutoff = 1e-05,
  p_adj_cutoff = NULL,
  log2FC_cutoff = 0,
  consistent_peak = FALSE,
  consistent_log2FC_cutoff = 0,
  consistent_fdr_cutoff = 0.05,
  alpha = 0.05,
  p0 = 0.8,
  parallel = FALSE,
  bp_param = NULL
)

## S4 method for signature 'MeripBamFileList'
exomePeakCalling(
  merip_bams = NULL,
  txdb = NULL,
  bsgenome = NULL,
  genome = NA,
  mod_annot = NULL,
  glm_type = c("DESeq2", "NB", "Poisson"),
  background_method = c("all", "Gaussian_mixture", "m6Aseq_prior", "manual"),
  manual_background = NULL,
  correct_GC_bg = TRUE,
  qtnorm = FALSE,
  gff_dir = NULL,
  fragment_length = 100,
  binding_length = 25,
  step_length = binding_length,
  peak_width = fragment_length/2,
  pc_count_cutoff = 5,
  bg_count_cutoff = 50,
  p_cutoff = 1e-05,
  p_adj_cutoff = NULL,
  log2FC_cutoff = 1,
  consistent_peak = FALSE,
  consistent_log2FC_cutoff = 1,
  consistent_fdr_cutoff = 0.05,
  alpha = 0.05,
  p0 = 0.8,
  parallel = FALSE,
  bp_param = NULL
)

Arguments

merip_bams

a MeripBamFileList object returned by scanMeripBAM.

txdb

a TxDb object for the transcript annotation.

If the TxDb object is not available, it could be a character string of the UCSC genome name which is acceptable by makeTxDbFromUCSC. For example: "hg19".

bsgenome

a BSgenome object for the genome sequence information.

If the BSgenome object is not available, it could be a character string of the UCSC genome name which is acceptable by getBSgenome. For example: "hg19".

genome

a character string of the UCSC genome name which is acceptable by getBSgenome or/and makeTxDbFromUCSC. For example: "hg19".

By default, the argument = NA, it should be provided when the BSgenome or/and the TxDb object are not available.

mod_annot

a GRanges or GRangesList object for user provided single based RNA modification annotation, the widths of the ranged object should be all equal to 1.

If user provides the single based RNA modification annotation, exomePeak2 will perform reads count and (differential) modification quantification on the provided annotation.

The single base annotation will be flanked by length = floor(fragment_length - binding_length/2) to account for the fragment length of the sequencing library.

glm_type

a character speciefies the type of Generalized Linear Model (GLM) fitted for the purpose of statistical inference during peak calling, which can be one of the c("DESeq2", "NB", "Poisson").

DESeq2

Fit the GLM defined in function DESeq, which is the NB GLM with regulated estimation of the overdispersion parameters.

NB

Fit the Negative Binomial (NB) GLM.

Poisson

Fit the Poisson GLM.

By default, the DESeq2 GLMs are fitted on the data set with > 1 biological replicates for both the IP and input samples, the Poisson GLM will be fitted otherwise.

background_method

a character specifies the method for the background finding, i.e. to identify the windows without modification signal. It could be one of the "Gaussian_mixture", "m6Aseq_prior", "manual", and "all"; default = "all".

In order to accurately account for the technical variations, it is often neccessary to estimate the GC content linear effects on windows without modification signals (background).

The following methods are supported in ExomePeak2 to differentiate the no modification background windows from the modification containig windows.

Gaussian_mixture

The background is identified by Multivariate Gaussian Mixture Model (MGMM) with 2 mixing components on the vectors containing methylation level estimates and GC content, the background regions are predicted by the Bayes Classifier on the learned GMM.

m6Aseq_prior

The background is identified by the prior knowledge of m6A topology, the windows that are not overlapped with long exons (exon length >= 400bp) and 5'UTR are treated as the background windows.

This type of background should not be used if the MeRIP-seq data is not using anti-m6A antibody.

manual

The background regions are defined by user manually at the argument manual_background.

all

Use all windows as the background. This is equivalent to not differentiating background and signal. It can lead to biases during the sequencing depth and the GC content correction factors estimation.

manual_background

a GRanges object for the user provided unmodified background; default = NULL.

correct_GC_bg

a logical value of whether to estimate the GC content linear effect on background regions; default = TRUE.

If = TRUE, it could lead to a more accurate estimation of GC content bias for the RNA modifications that are highly biologically related to GC content.

qtnorm

a logical of whether to perform subset quantile normalization after the GC content linear effect correction; default = FASLE.

If qtnorm = TRUE, subset quantile normalization will be applied within the IP and input samples seperately to account for the inherent differences between the marginal distributions of IP and input samples.

gff_dir

optional, a character which specifies the directory toward a gene annotation GFF/GTF file, it is applied when the TxDb object is not available, default = NULL.

fragment_length

a positive integer number for the expected fragment length in nucleotides; default = 100.

binding_length

a positive integer number for the expected binding length of the anti-modification antibody in IP samples; default = 25.

step_length

a positive integer number for the shift distances of the sliding window; default = binding_length.

peak_width

a numeric value for the minimum width of the merged peaks; default = fragment_length .

pc_count_cutoff

a numeric value for the cutoff on average window's reads count in peak calling; default = 5.

bg_count_cutoff

a numeric value for the cutoff on average window's reads count in background identification; default = 50.

p_cutoff

a numeric value for the cutoff on p values in peak calling; default = 1e-05.

p_adj_cutoff

a numeric value for the cutoff on Benjamini Hochberg adjusted p values in peak calling; default = NULL.

log2FC_cutoff

a numeric value for the cutoff on log2 IP over input fold changes in peak calling; default = 1.

consistent_peak

a logical of whether the positive peaks returned should be consistent among all the replicates; default = FALSE.

consistent_log2FC_cutoff

a numeric for the modification log2 fold changes cutoff in the peak consisency calculation; default = 1.

consistent_fdr_cutoff

a numeric for the BH adjusted C-test p values cutoff in the peak consistency calculation; default = 0.05. Check ctest.

alpha

a numeric for the binomial quantile used in the consistent peak filter; default = 0.05.

p0

a numeric for the binomial proportion parameter used in the consistent peak filter; default = 0.8.

For a peak to be consistently methylated, the minimum number of significant enriched replicate pairs is defined as the 1 - alpha quantile of a binomial distribution with p = p0 and N = number of possible pairs between replicates.

The consistency defined in this way is equivalent to the rejection of an exact binomial test with null hypothesis of p < p0 and N = replicates number of IP * replicates number of input.

parallel

a logical value of whether to use parallel computation, typlically it requires more than 16GB of RAM if parallel = TRUE; default = FALSE.

bp_param

optional, a BiocParallelParam object that stores the configuration parameters for the parallel execution.

Details

exomePeakCalling perform peak calling from the MeRIP-seq BAM files on exon regions defined by the user provided transcript annotations. If the BSgenome object is provided, the peak calling will be conducted with the GC content bias correction.

Under the default setting, for each window, exomePeak2 will fit a GLM of Negative Binomial (NB) with regulated estimation of the overdispersion parameters developed in DESeq. Wald tests with H0 of IP/input Log2 Fold Change (LFC) <= 0 are performed on each of the sliding windows. The significantly modified peaks are selected using the cutoff of p value < 0.0001.

Value

a SummarizedExomePeak object.

See Also

exomePeak2, glmM, glmDM, normalizeGC, exportResults

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
### Define File Directories

GENE_ANNO_GTF = system.file("extdata", "example.gtf", package="exomePeak2")

f1 = system.file("extdata", "IP1.bam", package="exomePeak2")
f2 = system.file("extdata", "IP2.bam", package="exomePeak2")
f3 = system.file("extdata", "IP3.bam", package="exomePeak2")
f4 = system.file("extdata", "IP4.bam", package="exomePeak2")
IP_BAM = c(f1,f2,f3,f4)
f1 = system.file("extdata", "Input1.bam", package="exomePeak2")
f2 = system.file("extdata", "Input2.bam", package="exomePeak2")
f3 = system.file("extdata", "Input3.bam", package="exomePeak2")
INPUT_BAM = c(f1,f2,f3)

f1 = system.file("extdata", "treated_IP1.bam", package="exomePeak2")
TREATED_IP_BAM = c(f1)
f1 = system.file("extdata", "treated_Input1.bam", package="exomePeak2")
TREATED_INPUT_BAM = c(f1)

### Peak Calling

MeRIP_Seq_Alignment <- scanMeripBAM(
                         bam_ip = IP_BAM,
                         bam_input = INPUT_BAM,
                         paired_end = FALSE
                        )

sep <- exomePeakCalling(
            merip_bams = MeRIP_Seq_Alignment,
            gff_dir = GENE_ANNO_GTF,
            genome = "hg19"
            )

sep <- normalizeGC(sep)

sep <- glmM(sep)

exportResults(sep)

### Differential Modification Analysis (Comparison of Two Conditions)

MeRIP_Seq_Alignment <- scanMeripBAM(
                         bam_ip = IP_BAM,
                         bam_input = INPUT_BAM,
                         bam_treated_ip = TREATED_IP_BAM,
                         bam_treated_input = TREATED_INPUT_BAM,
                         paired_end = FALSE
                        )

sep <- exomePeakCalling(
            merip_bams = MeRIP_Seq_Alignment,
            gff_dir = GENE_ANNO_GTF,
            genome = "hg19"
            )

sep <- normalizeGC(sep)

sep <- glmDM(sep)

exportResults(sep)

exomePeak2 documentation built on Nov. 8, 2020, 5:27 p.m.