normalizeGC-methods: Method normalizeGC

Description Usage Arguments Details Value See Also Examples

Description

normalizeGC estimates the feature specific size factors in order to reduce the technical variation during modification peak statistics quantification.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
normalizeGC(
  sep,
  bsgenome = "hg19",
  txdb = "hg19",
  gff_dir = NULL,
  fragment_length = 100,
  binding_length = 25,
  feature = c("Background", "Modification", "All"),
  qtnorm = FALSE,
  effective_GC = FALSE
)

## S4 method for signature 'SummarizedExomePeak'
normalizeGC(
  sep,
  bsgenome = NULL,
  txdb = NULL,
  gff_dir = NULL,
  fragment_length = 100,
  binding_length = 25,
  feature = c("Background", "Modification", "All"),
  qtnorm = FALSE,
  effective_GC = FALSE
)

Arguments

sep

a SummarizedExomePeak object returned by exomePeak2 or exomePeakCalling.

bsgenome

a BSgenome object for the genome reference, If the BSgenome object is not available, it could be a character string of the UCSC genome name which is acceptable by getBSgenome, example: "hg19".

txdb

a TxDb object for the transcript annotation, If the TxDb object is not available, it could be a character string of the UCSC genome name which is acceptable by makeTxDbFromUCSC, example: "hg19".

gff_dir

optional, a character which specifies the directory toward a gene annotation GFF/GTF file, it is applied when the TxDb object is not available, default = NULL.

fragment_length

a positive integer number for the expected fragment length in nucleotides; default = 100.

binding_length

a positive integer number for the expected binding length of the anti-modification antibody in IP samples; default = 25.

feature

a character specifies the region used in the GC content linear effect estimation, can be one in c("Background","All","Modification"); default is "Background".

Background

The GC content linear effects will be estimated on the background regions. By default, the background is defined as the exon regions not overlapping with peaks / modification sites flanked by the fragment length. You could select alternative background finding methods with background at exomePeakCalling.

Modification

The GC content linear effects will be estimated on the regions of modification peaks/sites.

All

The GC content linear effects will be estimated on all regions, i.e. both the region of modification and the background control regions.

qtnorm

a logical of whether to perform subset quantile normalization after the GC content linear effect correction; default = FALSE.

If qtnorm = TRUE, subset quantile normalization will be applied within the IP and input samples seperately to account for the inherent differences between the marginal distributions of IP and input samples.

effective_GC

a logical of whether to calculate the effective GC content weighted by the fragment alignment probabilities; default = FALSE.

Details

PCR amplication bias related to GC content is a major source of technical variation in RNA-seq. The GC content biases are usually correlated within the same laboratory environment, and this will result in the batch effect between different studies.

The GC content normalization can result in an improvement of peak accuracy for most published m6A-seq data, and it is particullarly recommended if you want to compare the quantifications on methylation levels between different laboratory conditions.

Value

a SummarizedExomePeak object with the updated slot GCsizeFactors.

See Also

estimateSeqDepth

Examples

1
2
3
4
5
6
7
### Load the example SummarizedExomPeak object
f1 = system.file("extdata", "sep_ex_mod.rds", package="exomePeak2")

sep <- readRDS(f1)

### Normalize the GC content biases
sep <- normalizeGC(sep)

exomePeak2 documentation built on Nov. 8, 2020, 5:27 p.m.