View source: R/preprocessIntervals.R
preprocessIntervals | R Documentation |
Optimize intervals for copy number calling by tiling long intervals and by
including off-target regions. Uses scanFa
from the Rsamtools package
to retrieve GC content of intervals in a reference FASTA file. If provided,
will annotate intervals with mappability and replication timing scores.
preprocessIntervals(
interval.file,
reference.file,
output.file = NULL,
off.target = FALSE,
average.target.width = 400,
min.target.width = 100,
min.off.target.width = 20000,
average.off.target.width = 2e+05,
off.target.padding = -500,
mappability = NULL,
min.mappability = c(0.6, 0.1, 0.7),
reptiming = NULL,
average.reptiming.width = 1e+05,
exclude = NULL,
off.target.seqlevels = c("targeted", "all"),
small.targets = c("resize", "drop")
)
interval.file |
File specifying the intervals. Interval is expected in
first column in format CHR:START-END. Instead of a file, a |
reference.file |
Reference FASTA file. |
output.file |
Optionally, write GC content file. |
off.target |
Include off-target regions. |
average.target.width |
Split large targets to approximately this size. |
min.target.width |
Make sure that target regions are of at least
this specified width. See |
min.off.target.width |
Only include off-target regions of that size |
average.off.target.width |
Split off-target regions to that |
off.target.padding |
Pad off-target regions. |
mappability |
Annotate intervals with mappability score. Assumed on a scale
from 0 to 1, with score being 1/(number alignments). Expected as |
min.mappability |
|
reptiming |
Annotate intervals with replication timing score. Expected as
|
average.reptiming.width |
Tile |
exclude |
Any target that overlaps with this |
off.target.seqlevels |
Controls how to deal with chromosomes/contigs
found in the |
small.targets |
Strategy to deal with targets smaller than
|
Returns GC content by interval as GRanges
object.
Markus Riester
Talevich et al. (2016). CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. PLoS Comput Biol.
reference.file <- system.file("extdata", "ex2_reference.fa",
package = "PureCN", mustWork = TRUE)
interval.file <- system.file("extdata", "ex2_intervals.txt",
package = "PureCN", mustWork = TRUE)
bed.file <- system.file("extdata", "ex2_intervals.bed",
package = "PureCN", mustWork = TRUE)
preprocessIntervals(interval.file, reference.file,
output.file = "gc_file.txt")
intervals <- import(bed.file)
preprocessIntervals(intervals, reference.file,
output.file = "gc_file.txt")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.