Description Usage Arguments Details Value
View source: R/Read_DBS_VCF_and_BAMs_to_verify_DBSs.R
Determine whether sequencing reads in fact support (candidate) DBSs present in a VCF file.
1 2 3 4 5 6 7 8 9 10 11 12 |
input.vcf |
If a character string, then the path to a VCF file; otherwise A a single VCF "file" as a data.frame or similar object. |
Nbam.name |
The name of the BAM file for the normal sample corresponding to |
Tbam.name |
The name of the BAM file for the tumor sample corresponding to |
N.slice.dir |
Directory for the slices of the normal BAM. Created if necessary. |
T.slice.dir |
Directory for the slices of the tumor BAM.
Created if necessary. Must be different than |
unlink.slice.dir |
If |
exclude.SBSs |
If |
verbose |
If > 0 print a message when starting the number of slices
generated every |
outfile |
If not |
filter.status |
If not |
Creates a new VCF file. This VCF file has no data rows if there were no DBSs to analyze. Otherwise, this VCF contains some additional columns. Any SBSs or indels in the input are silently ignored, and no attempt is made to merge adjacent SBSs.
NreadSupport
With regard to the two positions of the DBS in
the normal BAM, a string with 4 numbers separated by ":", with the numbers
indicating respectively:
the number of reads that are reference sequence at both positions of the DBS,
the number of reads that that have the alternative allele only at the 1st position of the DBS,
the number of reads that have the alternative allele only at the second position of the DBS, and
the number of reads that have the alternative alleles at both positions of the DBS.
TreadSupport
Information analogous to that in NreadSupport
, for the
tumor BAM.
num_bad_mapped_reads
The total number of tumor reads with MAPQ < 30
or with a mate on a different chromosome. If there are many badly mapped
reads in the slice the slice may represent a segmental duplication.
num_bad_mapped_DBS_reads
The number of tumor reads with the
putative DBS but with MAPQ < 30 or a mate on a different chromosome.
If many badly mapped reads support DBSs the DBS might results from
mismapped reads in a segmental duplication.
DBSconclusion
A string that describes whether the DBSs is
believable ("True DBS"
), or if the DBS is not
believable, a string that describes
why not.
The decision in DBSconclusion
is based on multiple criteria.
I also suggest relying on any available upstream filtering of SBSs
that get merged into DBSs, as well as upstream filtering of DBSs.
It is difficult to capture all the possible characteristics of
likely miscalled DBSs, especially if they stem from mismapped reads
that nevertheless have high MAPQ (mapping quality). The code that
implements these criteria is in DBS_conclusion_1_row
,
which depends on classification of individual reads in
ReadSamfile
. The filtering of reads in the
input SAM files is described in the documentation for ReadSamfile
.
Once the reads to analyze are selected, additional criteria include
There must be >= 5 normal reads at the site of the putative tumor DBS.
At each separate position of the DBS, the normal reads must have < 10% of the variant in the DBS at that position.
< 2 normal reads support the DBS.
At least 2 tumor reads support the DBS.
There are more well-mapped tumor reads than badly mapped tumor reads at the site of the DBS.
There are more well-mapped tumor reads than badly mapped
tumor reads that contain the DBS. (ReadSamfile
keeps track of the well-mapped and badly mapped reads).
If 1 normal read supports the DBS, then there must be a statistically greater proportion of tumor reads supporting the DBS (by Fisher's test).
There must be >= 5 normal reads at the site of the putative tumor DBS.
At each separate position of the DBS, the normal reads must have < 10% of the variant in the DBS at that position.
< 2 normal reads support the DBS.
At least 2 tumor reads support the DBS.
There are more well-mapped tumor reads than badly mapped tumor reads at the site of the DBS.
There are more well-mapped tumor reads than badly mapped
tumor reads that contain the DBS. (ReadSamfile
keeps track of the well-mapped and badly mapped reads).
If 1 normal read supports the DBS, then there must be a statistically greater proportion of tumor reads supporting the DBS (by Fisher's test).
Invisibly, a list with the elements
The name of the DBS-only VCF file created.
The in-memory representation of the DBS VCF as a data.table
.
The name of the directory with the normal SAM slices, if unlink.slice.dir
is FALSE
.
The name of the directory with the tumor SAM slices, if unlink.slice.dir
is FALSE
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.