extract_vcf: Retrieve descriptors from vcf file
In mmaitenat/ideafix: ideafix, DEAmination FIXing

Description Usage Arguments Details Value

extract_vcf obtains from vcf_filename, a set of descriptors for the C:G > T:A variants defined in mutation_ids. These descriptors are relevant to their classification into deaminations or non-deaminations and may be directly extracted from vcf_filename or calculated using data retrieved from it.

1	extract_vcf(vcf_filename, mutation_ids, samplename)

`vcf_filename`	character string naming the path to the input vcf, i.e. the vcf file containing the variants in `mutation_ids`. This file must have been generated with Mutect2, either in tumor only or tumor/normal mode with strand bias annotation enabled.
`mutation_ids`	character vector containing the ids of the loci to get the descriptors of. Id format is CHR:POS and can be obtained by calling the function `get_mut_id` on `vcf_filename`.
`samplename`	character string naming the sample in `vcf_filename`. This must match the name given to the sample when running Mutect2. It can be obtained by calling `get_samplename` on `vcf_filename`.

The returned tibble contains the values of each C:G > T:A variant for the following descriptors, divided in columns: VAF, number of alternate bases, normalized number of alternate bases, number of reference bases, normalized number of reference bases, reference allele, alternate allele, base quality, base quality fraction, fragment length, median position from read end, normalized median position from read end, mapping quality, FDeamC, SOB, SB-GUO and SB-GATK.

Tibble containing a set of descriptors for the C:G > T:A variants in mutation_ids and extracted or calculated using vcf_filename.

mmaitenat/ideafix documentation built on Sept. 18, 2021, 7:55 a.m.