extract_vcf: Retrieve descriptors from vcf file

Description Usage Arguments Details Value

View source: R/get_descriptors.R

Description

extract_vcf obtains from vcf_filename, a set of descriptors for the C:G > T:A variants defined in mutation_ids. These descriptors are relevant to their classification into deaminations or non-deaminations and may be directly extracted from vcf_filename or calculated using data retrieved from it.

Usage

1
extract_vcf(vcf_filename, mutation_ids, samplename)

Arguments

vcf_filename

character string naming the path to the input vcf, i.e. the vcf file containing the variants in mutation_ids. This file must have been generated with Mutect2, either in tumor only or tumor/normal mode with strand bias annotation enabled.

mutation_ids

character vector containing the ids of the loci to get the descriptors of. Id format is CHR:POS and can be obtained by calling the function get_mut_id on vcf_filename.

samplename

character string naming the sample in vcf_filename. This must match the name given to the sample when running Mutect2. It can be obtained by calling get_samplename on vcf_filename.

Details

The returned tibble contains the values of each C:G > T:A variant for the following descriptors, divided in columns: VAF, number of alternate bases, normalized number of alternate bases, number of reference bases, normalized number of reference bases, reference allele, alternate allele, base quality, base quality fraction, fragment length, median position from read end, normalized median position from read end, mapping quality, FDeamC, SOB, SB-GUO and SB-GATK.

Value

Tibble containing a set of descriptors for the C:G > T:A variants in mutation_ids and extracted or calculated using vcf_filename.


mmaitenat/ideafix documentation built on Sept. 18, 2021, 7:55 a.m.