preprocessIntervals: Preprocess intervals
In lima1/PureCN: Copy number calling and SNV classification using targeted short read sequencing

preprocessIntervals

R Documentation

Preprocess intervals

Description

Optimize intervals for copy number calling by tiling long intervals and by including off-target regions. Uses scanFa from the Rsamtools package to retrieve GC content of intervals in a reference FASTA file. If provided, will annotate intervals with mappability and replication timing scores.

Usage

preprocessIntervals(
  interval.file,
  reference.file,
  output.file = NULL,
  off.target = FALSE,
  average.target.width = 400,
  min.target.width = 100,
  min.off.target.width = 20000,
  average.off.target.width = 2e+05,
  off.target.padding = -500,
  mappability = NULL,
  min.mappability = c(0.6, 0.1, 0.7),
  reptiming = NULL,
  average.reptiming.width = 1e+05,
  exclude = NULL,
  off.target.seqlevels = c("targeted", "all"),
  small.targets = c("resize", "drop")
)

Arguments

`interval.file`	File specifying the intervals. Interval is expected in first column in format CHR:START-END. Instead of a file, a `GRanges` object can be provided. This allows the use of BED files for example. Note that GATK interval files are 1-based (first position of the genome is 1). Other formats like BED files are often 0-based. The `import` function will automatically convert to 1-based `GRanges`.
`reference.file`	Reference FASTA file.
`output.file`	Optionally, write GC content file.
`off.target`	Include off-target regions.
`average.target.width`	Split large targets to approximately this size.
`min.target.width`	Make sure that target regions are of at least this specified width. See `small.targets`.
`min.off.target.width`	Only include off-target regions of that size
`average.off.target.width`	Split off-target regions to that
`off.target.padding`	Pad off-target regions.
`mappability`	Annotate intervals with mappability score. Assumed on a scale from 0 to 1, with score being 1/(number alignments). Expected as `GRanges` object with first meta column being the score. Regions outside these ranges are ignored, assuming that `mappability` covers the whole accessible genome.
`min.mappability`	`double(3)` specifying the minimum mappability score for on-target, off-target, and chrY regions in that order. The chrY regions are only used for sex determination in ‘PureCN’ and are therefore treated differently. Requires `mappability`.
`reptiming`	Annotate intervals with replication timing score. Expected as `GRanges` object with first meta column being the score.
`average.reptiming.width`	Tile `reptiming` into bins of specified width.
`exclude`	Any target that overlaps with this `GRanges` object will be excluded.
`off.target.seqlevels`	Controls how to deal with chromosomes/contigs found in the `reference.file` but not in the `interval.file`.
`small.targets`	Strategy to deal with targets smaller than `min.target.width`.

Value

Returns GC content by interval as GRanges object.

Author(s)

Markus Riester

References

Talevich et al. (2016). CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. PLoS Comput Biol.

Examples


reference.file <- system.file("extdata", "ex2_reference.fa",
    package = "PureCN", mustWork = TRUE)
interval.file <- system.file("extdata", "ex2_intervals.txt",
    package = "PureCN", mustWork = TRUE)
bed.file <- system.file("extdata", "ex2_intervals.bed",
    package = "PureCN", mustWork = TRUE)
preprocessIntervals(interval.file, reference.file,
    output.file = "gc_file.txt")

intervals <- import(bed.file)
preprocessIntervals(intervals, reference.file,
    output.file = "gc_file.txt")

lima1/PureCN documentation built on June 15, 2025, 5:30 a.m.

lima1/PureCN index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lima1/PureCN
Copy number calling and SNV classification using targeted short read sequencing

preprocessIntervals: Preprocess intervals
In lima1/PureCN: Copy number calling and SNV classification using targeted short read sequencing