GLM_inference: Statistical Inference with DESeq2 on IP over input fold...

Description Usage Arguments Value

View source: R/GLM_inference.R

Description

GLM_inference conduct inference on log2 fold changes of IP over input using the GLM defined in DESeq2.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
GLM_inference(
  SE_bins,
  glm_type = c("Poisson", "NB", "DESeq2"),
  p_cutoff = 1e-05,
  p_adj_cutoff = NULL,
  count_cutoff = 5,
  log2FC_mod = 1,
  min_mod_number = NA,
  correct_GC_bg = FALSE,
  qtnorm = TRUE,
  consistent_peak = FALSE,
  consistent_log2FC_cutoff = 1,
  consistent_fdr_cutoff = 0.05,
  alpha = 0.05,
  p0 = 0.8
)

Arguments

SE_bins

a SummarizedExperiment of read count. It should contains a colData with column named design_IP, which is a character vector with values of "IP" and "input". The column helps to index the design of MeRIP-Seq experiment.

glm_type

a character, which can be one of the "Poisson", "NB", and "DESeq2". This argument specify the type of generalized linear model used in peak calling; Default to be "Poisson". The DESeq2 method is only recommended for high power experiments with more than 3 biological replicates for both IP and input.

p_cutoff

a numeric for the p value cutoff used in DESeq inference.

p_adj_cutoff

a numeric for the adjusted p value cutoff used in DESeq2 inference; if provided, values in p_cutoff will be ignored.

count_cutoff

an integer indicating the cutoff of the mean of reads count in a row, inference is only performed on the windows with read count bigger than the cutoff. Default value is 10.

log2FC_mod

a non negative numeric for the log2 fold change cutoff used in DESeq inferene for modification containing peaks (IP > input).

min_mod_number

a non negative numeric for the minimum number of the reported modification containing bins. If the bins are filtered less than this number by the p values or effect sizes, more sites will be reported by the order of the p value until it reaches this number; Default to be calculated by floor( sum(rowSums( assay(SE_bins) ) > 0)*0.001 ).

correct_GC_bg

a logical of whether to estimate the GC content linear effect on background regions; default = FALSE.

If correct_GC_bg = TRUE, it may result in a more accurate estimation of the technical effect of GC content for the RNA modifications that are highly biologically related to GC content.

qtnorm

a logical of whether to perform subset quantile normalization after the GC content linear effect correction; default = TRUE.

Subset quantile normalization will be applied within the IP and input samples seperately to account for the inherent differences between the marginal distributions of IP and input samples.

consistent_peak

a logical of whether the positive peaks returned should be consistent among replicates; default = TRUE.

consistent_log2FC_cutoff

a numeric for the modification log2 fold changes cutoff in the peak consisency calculation; default = 1.

consistent_fdr_cutoff

a numeric for the BH adjusted C-test p values cutoff in the peak consistency calculation; default = 0.05. Check ctest.

alpha

a numeric for the binomial quantile used in the consitent peak filter; default = 0.05.

p0

a numeric for the binomial proportion parameter used in the consistent peak filter; default = 0.8.

For a peak to be consistently methylated, the minimum number of significant enriched replicate pairs is defined as the 1 - alpha quantile of a binomial distribution with p = p0 and N = number of possible pairs between replicates.

The consistency defined in this way is equivalent to the rejection of an exact binomial test with null hypothesis of p < p0 and N = replicates number of IP * replicates number of input.

Value

a list of the index for the significant modified peaks (IP > input) and control peaks (peaks other than modification containing peaks).


exomePeak2 documentation built on Nov. 8, 2020, 5:27 p.m.