Home

/

GitHub

/

shahcompbio/HMMcopy

/

correctReadcount: Readcount correction for GC and mappability bias

correctReadcount: Readcount correction for GC and mappability bias
In shahcompbio/HMMcopy: Copy number prediction with correction for GC and mappability bias for HTS data

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/correction.R

Corrects readcounts for GC and mappability bias using the binning/loess method optimized for speed.

1	correctReadcount(x, mappability = 0.9, samplesize = 50000, verbose = TRUE)

`x`	`RangedData` object returned by `wigsToRangedData`
`mappability`	Mappability threshold [0, 1] below which points are ignored during creating the correction curve.
`samplesize`	The number of points sampled during LOESS fitting, decreasing reduces runtime and memory usage, at the expense of robustness to data randomness.
`verbose`	Set to FALSE it messages are not desired.

Input read counts are contained in the IRanges object, where number of reads within bins (or sometimes called windows) of defined genomic size are specified. GC content should also be computed using the exact same boundaries for each bin.

Ensure that the GC content and mappability files have been mapped to the same genome build (e.g. hg18) as the tumour and normal read libraries.

Here is a summary of the correction procedure.

Filter out bins with 0 reads and 0 GC content
Filter out bins with reads within the top and bottom 1% quantile (assumed to be outliers)
Filter out bins with GC content within the top and bottom 1% quantile
Filter out bins with a mappability score of greater than 0.9 ('mappability' argument).
Randomly sample up to 50000 ('samplesize' argument) of the remaining high-quality bins for the purposes keeping a good runtime.
The first loess (on the reads-by-GC curve) with a small span (smoothing window) is performed, obtaining typically a highly sensitive curve (follows low density tails of distribution, but gets jagged in high density center).
A second loess (on the first loess results) with a larger span is performed, recapitulating the curve in the low density tails and smoothing out the jagged regions in the high density center.
'cor.gc' is obtained by correcting each bin for GC content. The number of observed reads is divided by the number of reads predicted by the loess curve given an observed GC proportion.
Filter out just the top 1% quantile of the cor.gc bins, then _randomly_ sample up to 50000 ('samplesize' argument) bins.
A separate lowess curve is computed for mappability-by-GC (cor.gc).
'cor.map' is obtained by correcting each bin for mappability. The cor.gc value is divided by the cor.gc value predicted by the mappability lowess curve generated in the previous step.
'copy' is obtained by setting all cor.map values <= to NA (i.e. NaN), then apply log2

The original A RangedData object appended with 5 new columns:

valid: Valid bins, which have valid read, gc, and mappability values
ideal: Ideal bins of high mappability and no outliers
cor.gc: GC-corrected readcounts
cor.map: Mappability corrected GC-corrected readcounts
copy: cor.map values after log base 2

Daniel Lai

Yuval Benjamini and Terence P Speed. Summarizing and correcting the gc content bias in high-throughput sequencing. Nucleic Acids Res, 40(10):e72, May 2012.

wigsToRangedData to easily generate the proper input.

1 2	data(tumour) # Load tumour_reads tumour_copy <- correctReadcount(tumour_reads)

shahcompbio/HMMcopy documentation built on Dec. 6, 2019, 12:47 a.m.

shahcompbio/HMMcopy index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

shahcompbio/HMMcopy
Copy number prediction with correction for GC and mappability bias for HTS data

correctReadcount: Readcount correction for GC and mappability bias
In shahcompbio/HMMcopy: Copy number prediction with correction for GC and mappability bias for HTS data

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to correctReadcount in shahcompbio/HMMcopy...

R Package Documentation

Browse R Packages

We want your feedback!

shahcompbio/HMMcopy Copy number prediction with correction for GC and mappability bias for HTS data

correctReadcount: Readcount correction for GC and mappability bias In shahcompbio/HMMcopy: Copy number prediction with correction for GC and mappability bias for HTS data

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to correctReadcount in shahcompbio/HMMcopy...

R Package Documentation

Browse R Packages

We want your feedback!

shahcompbio/HMMcopy
Copy number prediction with correction for GC and mappability bias for HTS data

correctReadcount: Readcount correction for GC and mappability bias
In shahcompbio/HMMcopy: Copy number prediction with correction for GC and mappability bias for HTS data