classify_variants_XGBoost: Classify variants as deaminations or non-deaminations with...

Description Usage Arguments Details Value

View source: R/classify_variants.R

Description

classify_variants_XGBoost classifies a set of C:G > T:A variants into deaminations or non-deaminations based on a series of relevant variant descriptors, using an XGBoost-based model.

Usage

1
classify_variants_XGBoost(variant_descriptors)

Arguments

variant_descriptors

tibble containing the variants to be classified together with their values for a series of descriptors obtained by get_descriptors.

Details

classify_variants_XGBoost is able to run only when variant_descriptors contains variant values for the whole set of descriptors included in the XGBoost model, namely VAF, number of alternate bases, normalized number of alternate bases, number of reference bases, normalized number of reference bases, reference allele, alternate allele, base quality, base quality fraction, fragment length, median position from read end, mapping quality, FDeamC, SOB, SB-GUO, SB-GATK, normalized median position from read end, base two positions before, base one position before, base two positions after, base one position after, dinucleotide before and dinucleotide after. If any of these is not present in the variant_descriptors object, an error message is thrown and the variant classification is stopped. Notice that all these descriptor values are automatically retrieved from a Mutect2 vcf file using get_descriptors.

Value

Tibble with six columns: CHROM, POS, REF, ALT, DEAM_SCORE, DEAMINATION. CHROM and POS identify the variant position, REF and ALT describe the reference and alternate alleles. DEAM_SCORE equals to the deamination score yielded by the selected classification algorithm (RF or XGBoost). Note that these values should not be interpreted as ordinary probabilities. DEAMINATION contains the label ideafix has assigned to the variant based on an optimized classification threshold.


mmaitenat/ideafix documentation built on Sept. 18, 2021, 7:55 a.m.