normalizeGC-methods: Method normalizeGC
In exomePeak2: Bias Awared Peak Calling and Quantification for MeRIP-Seq

Description Usage Arguments Details Value See Also Examples

normalizeGC estimates the feature specific size factors in order to reduce the technical variation during modification peak statistics quantification.

normalizeGC(
  sep,
  bsgenome = "hg19",
  txdb = "hg19",
  gff_dir = NULL,
  fragment_length = 100,
  binding_length = 25,
  feature = c("Background", "Modification", "All"),
  qtnorm = FALSE,
  effective_GC = FALSE
)

## S4 method for signature 'SummarizedExomePeak'
normalizeGC(
  sep,
  bsgenome = NULL,
  txdb = NULL,
  gff_dir = NULL,
  fragment_length = 100,
  binding_length = 25,
  feature = c("Background", "Modification", "All"),
  qtnorm = FALSE,
  effective_GC = FALSE
)

`sep`	a `SummarizedExomePeak` object returned by `exomePeak2` or `exomePeakCalling`.
`bsgenome`	a `BSgenome` object for the genome reference, If the `BSgenome` object is not available, it could be a `character` string of the UCSC genome name which is acceptable by `getBSgenome`, example: `"hg19"`.
`txdb`	a `TxDb` object for the transcript annotation, If the `TxDb` object is not available, it could be a `character` string of the UCSC genome name which is acceptable by `makeTxDbFromUCSC`, example: `"hg19"`.
`gff_dir`	optional, a `character` which specifies the directory toward a gene annotation GFF/GTF file, it is applied when the `TxDb` object is not available, default `= NULL`.
`fragment_length`	a positive integer number for the expected fragment length in nucleotides; default `= 100`.
`binding_length`	a positive integer number for the expected binding length of the anti-modification antibody in IP samples; default `= 25`.
`feature`	a `character` specifies the region used in the GC content linear effect estimation, can be one in `c("Background","All","Modification")`; default is `"Background"`. `Background` The GC content linear effects will be estimated on the background regions. By default, the background is defined as the exon regions not overlapping with peaks / modification sites flanked by the fragment length. You could select alternative background finding methods with `background` at `exomePeakCalling`. `Modification` The GC content linear effects will be estimated on the regions of modification peaks/sites. `All` The GC content linear effects will be estimated on all regions, i.e. both the region of modification and the background control regions.
`qtnorm`	a `logical` of whether to perform subset quantile normalization after the GC content linear effect correction; default `= FALSE`. If `qtnorm = TRUE`, subset quantile normalization will be applied within the IP and input samples seperately to account for the inherent differences between the marginal distributions of IP and input samples.
`effective_GC`	a `logical` of whether to calculate the effective GC content weighted by the fragment alignment probabilities; default `= FALSE`.

PCR amplication bias related to GC content is a major source of technical variation in RNA-seq. The GC content biases are usually correlated within the same laboratory environment, and this will result in the batch effect between different studies.

The GC content normalization can result in an improvement of peak accuracy for most published m6A-seq data, and it is particullarly recommended if you want to compare the quantifications on methylation levels between different laboratory conditions.

a SummarizedExomePeak object with the updated slot GCsizeFactors.

estimateSeqDepth

### Load the example SummarizedExomPeak object
f1 = system.file("extdata", "sep_ex_mod.rds", package="exomePeak2")

sep <- readRDS(f1)

### Normalize the GC content biases
sep <- normalizeGC(sep)