findArtifactChimericReads: Find artifact chimeric reads in BAM file of FFPE sample

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/Main.R

Description

Artifact chimeric reads are enriched in NGS data of FFPE samples, these reads can lead to a large number of false positive SV calls. This function finds these artifact chimeric reads.

Usage

1
2
3
findArtifactChimericReads(file, maxReadsOfSameBreak=2, minMapBase=1,
threads=1, FFPEReadsFile=sub("\\.bam(\\.gz)?", ".FFPEReads.txt", file),
dupChimFile=sub("\\.bam(\\.gz)?", ".dupChim.txt", file))

Arguments

file

Path to the BAM file.

maxReadsOfSameBreak

The maximum allowed number of artifact chimeric reads sharing a false positive breakpoint. If the number of reads sharing the same breakpoint exceeds this number, these reads are not recognized as artifact chimeric reads. Reads marked as PCR or optical duplicates are excluded from the calculation. For paired-end sequencing, a read pair of artifact chimeric fragments may both contain the artifact breakpoints; thereby, the defalut is set to 2.

minMapBase

The minimum required length (bp) of a short complementary mapping for an artifact chimeric read. Artifact chimeric reads are derived from the combination of two single-stranded DNA fragments linked by short reverse complementary regions (SRCR). Reads with SRCR shorter than this length are not recognized as artifact chimeric reads. Note: sequence errors and mutations might influence the detection of the existence and length of SRCR. Suggested range: 0-3. When it is set to 0 or any value below 1, this step will be skipped.

threads

Number of threads to use. Multi-threading can speed up the process.

FFPEReadsFile

Path of the output txt file with artifact chimeric read names.

dupChimFile

Path of the output txt file with read names of PCR or optical duplicates of all chimeric reads.

Details

The next-generation sequencing (NGS) reads from formalin-fixed paraffin-embedded (FFPE) samples contain numerous artifact chimeric reads, which can lead to a large number of false positive structural variation (SV) calls. This function finds the read names of these artifact chimeric reads. To further filter these reads, filterBamByReadNames can be applied.

Value

A character vector of artifact chimeric read names.

Author(s)

Lanying Wei <lanying.wei@uni-muenster.de>

See Also

FilterFFPE, filterBamByReadNames, FFPEReadFilter

Examples

1
2
3
4
5
6
7
8
file <- system.file("extdata", "example.bam", package = "FilterFFPE")
outFolder <- tempdir()
FFPEReadsFile <- paste0(outFolder, "/example.FFPEReads.txt")
dupChimFile <- paste0(outFolder, "/example.dupChim.txt")
artifactReads <- findArtifactChimericReads(file = file, threads = 2,
                                           FFPEReadsFile = FFPEReadsFile,
                                           dupChimFile = dupChimFile)
head(artifactReads)

LanyingWei/FilterFFPE documentation built on Nov. 13, 2020, 3:58 a.m.