normalizeCoverage: correct coverage bias due to GC content

Description Usage Arguments Value Note Author(s) References See Also Examples

View source: R/PreProcessingShow.R

Description

function to normalize coverage by GC content and quantile normalization

Usage

1
2
normalizeCoverage(object, ..., control = NULL, writeToFile = TRUE,
  destination = NULL, plot = TRUE)

Arguments

object

A GRanges object returned from prepareCoverageGC function.

...

additional GRanges object to pass. Note1: If there is only one Granges object given, then coverage will be corrected by GC content. If there are more than one GRanges object from multiple samples are given, the function will first quantile normalize coverage across samples, then correct coverage by GC content in each sample. Note2: If more than one GRanges object provided, make sure they are different samples sequenced by the same protocol, which means the targeted region is the same Note3: If your input samples contain female and male, we suggest you separate them to get a more accurate normalization.

control

A GRanges object returned from prepareCoverageGC function. Default value: NULL. If you have a control normal sample, then put it here

writeToFile

Boolean Default: TRUE. If TRUE, normalized coverage table for each sample provided will be written to destination specified, the file will be named as "sample_normed_depth.txt". If set to FALSE, a GRangesList object will be returned

destination

A character, specify the path to the location where the normalized coverage table will be written. Default: NULL, the file will be written to current working directory

plot

Boolean Default: TRUE. If TRUE, the coverage vs. GC content plot before and after normalization will be plotted And the average coverage for each chromosome before and after normalization will be plotted

Value

If writeToFile is set to TRUE, normalized coverage will be written to the destination. Otherwise, a GRangesList object containing each of input sample will be returned.

Note

The normalize function works better when you have multiple samples sequenced using the same protocol, namely have the same targeted regions. And if you have female sample and male sample, the best way is to normalize them separately.

Author(s)

Yu Kong

References

C. Alkan, J. Kidd, T. Marques-Bonet et al (2009). Personalized copy number and segmental duplication maps using next-generation sequencing. Nature Genetics, 41(10):1061-7.

See Also

prepareCoverageGC

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
##------------------------------------
##if you deal with single sample
##------------------------------------
## 1. prepare coverage and gc
## specify the path to the location of bed file
target = system.file("extdata","target.bed",package="MADSEQ")

## specify the path to the bam file
aneuploidy_bam = system.file("extdata","aneuploidy.bam",package="MADSEQ")

## prepare coverage data for the aneuploidy sample
aneuploidy_cov_gc = prepareCoverageGC(target,aneuploidy_bam,"hg19")

## normalize the coverage
##---- if not write to file ----
aneuploidy_norm = normalizeCoverage(aneuploidy_cov_gc,writeToFile=FALSE)
## check the GRangesList and subset your sample
aneuploidy_norm
names(aneuploidy_norm)
aneuploidy_norm["aneuploidy_cov_gc"]

##---- if write to file ----
normalizeCoverage(aneuploidy_cov_gc,writeToFile=TRUE,destination=".")

##-----------------------------------------------------------
##if you deal with multiple samples without normal control
##-----------------------------------------------------------
## specify the path to the location of bed file
target = system.file("extdata","target.bed",package="MADSEQ")

## specify the path to the bam file
aneuploidy_bam = system.file("extdata","aneuploidy.bam",package="MADSEQ")
normal_bam = system.file("extdata","normal.bam",package="MADSEQ")

## prepare coverage data for the samples
aneuploidy_cov_gc = prepareCoverageGC(target,aneuploidy_bam,"hg19")
normal_cov_gc = prepareCoverageGC(target,normal_bam,"hg19")

## normalize the coverage
normed=normalizeCoverage(aneuploidy_cov_gc,normal_cov_gc,writeToFile=FALSE)
names(normed)
normed["aneuploidy_cov_gc"]
normed["normal_cov_gc"]
## or
normalizeCoverage(aneuploidy_cov_gc,normal_cov_gc,
                  writeToFile=TRUE,destination=".")

##-----------------------------------------------------------
##if you deal with multiple samples with a normal control
##-----------------------------------------------------------
## specify the path to the location of bed file
target = system.file("extdata","target.bed",package="MADSEQ")

## specify the path to the bam file
aneuploidy_bam = system.file("extdata","aneuploidy.bam",package="MADSEQ")
normal_bam = system.file("extdata","normal.bam",package="MADSEQ")

## prepare coverage data for the samples
aneuploidy_cov_gc = prepareCoverageGC(target,aneuploidy_bam,"hg19")
normal_cov_gc = prepareCoverageGC(target,normal_bam,"hg19")

## normalize the coverage
normed = normalizeCoverage(aneuploidy_cov_gc,
                           control=normal_cov_gc,writeToFile=FALSE)
## or
normalizeCoverage(aneuploidy_cov_gc,control=normal_cov_gc,
                  writeToFile=TRUE,destination=".")

MADSEQ documentation built on Nov. 8, 2020, 6:51 p.m.