Read_DBS_VCF_and_BAMs_to_verify_DBSs: Determine whether sequencing reads in fact support...

Description Usage Arguments Details Value

View source: R/Read_DBS_VCF_and_BAMs_to_verify_DBSs.R

Description

Determine whether sequencing reads in fact support (candidate) DBSs present in a VCF file.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
Read_DBS_VCF_and_BAMs_to_verify_DBSs(
  input.vcf,
  Nbam.name,
  Tbam.name,
  N.slice.dir = tempfile(),
  T.slice.dir = tempfile(),
  unlink.slice.dir = TRUE,
  exclude.SBSs = TRUE,
  verbose = 0,
  outfile = NULL,
  filter.status = "PASS"
)

Arguments

input.vcf

If a character string, then the path to a VCF file; otherwise A a single VCF "file" as a data.frame or similar object.

Nbam.name

The name of the BAM file for the normal sample corresponding to vcf.name.

Tbam.name

The name of the BAM file for the tumor sample corresponding to vcf.name.

N.slice.dir

Directory for the slices of the normal BAM. Created if necessary.

T.slice.dir

Directory for the slices of the tumor BAM. Created if necessary. Must be different than N.slice.dir.

unlink.slice.dir

If TRUE unlink N.slice.dir and T.slice.dir before return.

exclude.SBSs

If TRUE silently filter out (exclude) SBSs in the input VCF. This makes sense if the the VCF is from a caller (like Mutect or the Hartwig Medical Foundation caller) that calls both SBSs and DBS.

verbose

If > 0 print a message when starting the number of slices generated every verbose slices.

outfile

If not NULL then write the "evaluated" VCF to outfile; otherwise write it to paste0(input.vcf(vcf.name, "_evaluated.vcf"). Must be non-NULL if input.vcf is not a file path.

filter.status

If not NULL only keep rows where the FILTER column in the VCF is equal to filter.status.

Details

Creates a new VCF file. This VCF file has no data rows if there were no DBSs to analyze. Otherwise, this VCF contains some additional columns. Any SBSs or indels in the input are silently ignored, and no attempt is made to merge adjacent SBSs.

  1. NreadSupport With regard to the two positions of the DBS in the normal BAM, a string with 4 numbers separated by ":", with the numbers indicating respectively:

    • the number of reads that are reference sequence at both positions of the DBS,

    • the number of reads that that have the alternative allele only at the 1st position of the DBS,

    • the number of reads that have the alternative allele only at the second position of the DBS, and

    • the number of reads that have the alternative alleles at both positions of the DBS.

  2. TreadSupport Information analogous to that in NreadSupport, for the tumor BAM.

  3. num_bad_mapped_reads The total number of tumor reads with MAPQ < 30 or with a mate on a different chromosome. If there are many badly mapped reads in the slice the slice may represent a segmental duplication.

  4. num_bad_mapped_DBS_reads The number of tumor reads with the putative DBS but with MAPQ < 30 or a mate on a different chromosome. If many badly mapped reads support DBSs the DBS might results from mismapped reads in a segmental duplication.

  5. DBSconclusion A string that describes whether the DBSs is believable ("True DBS"), or if the DBS is not believable, a string that describes why not.

The decision in DBSconclusion is based on multiple criteria. I also suggest relying on any available upstream filtering of SBSs that get merged into DBSs, as well as upstream filtering of DBSs. It is difficult to capture all the possible characteristics of likely miscalled DBSs, especially if they stem from mismapped reads that nevertheless have high MAPQ (mapping quality). The code that implements these criteria is in DBS_conclusion_1_row, which depends on classification of individual reads in ReadSamfile. The filtering of reads in the input SAM files is described in the documentation for ReadSamfile.

Once the reads to analyze are selected, additional criteria include

Value

Invisibly, a list with the elements

  1. The name of the DBS-only VCF file created.

  2. The in-memory representation of the DBS VCF as a data.table.

  3. The name of the directory with the normal SAM slices, if unlink.slice.dir is FALSE.

  4. The name of the directory with the tumor SAM slices, if unlink.slice.dir is FALSE.


steverozen/DBSverify documentation built on Dec. 23, 2021, 5:34 a.m.