get_descriptors: Get variant descriptors

Description Usage Arguments Details Value

View source: R/get_descriptors.R

Description

get_descriptors obtains a set of descriptors of the C:G > T:A variants in vcf_filename that are relevant to their classification into deaminations or non-deaminations. These descriptors are extracted from vcf_filename and fasta_filename, or calculated using data retrieved from them.

Usage

1
get_descriptors(vcf_filename, fasta_filename)

Arguments

vcf_filename

character string naming the path to the input vcf, i.e. the vcf file containing the variants to classify. This file must have been generated with Mutect2, either in tumor only or tumor/normal mode with strand bias annotation enabled.

fasta_filename

character string naming the path to the reference genome FASTA file the sequencing data was aligned to.

Details

The returned tibble contains the values of each C:G > T:A variant for the following descriptors, divided in columns: VAF, number of alternate bases, normalized number of alternate bases, number of reference bases, normalized number of reference bases, reference allele, alternate allele, base quality, base quality fraction, fragment length, median position from read end, mapping quality, FDeamC, SOB, SB-GUO, SB-GATK, normalized median position from read end, base two positions before, base one position before, base two positions after, base one position after, dinucleotide before and dinucleotide after.

For further detail in each of them, see each of the individual help files.

Value

Tibble containing the descriptors of the C:G > T:A variants in vcf_filename needed for their classification.


mmaitenat/ideafix documentation built on Sept. 18, 2021, 7:55 a.m.