classify_variants: Classify variants as deaminations or non-deaminations

Description Usage Arguments Details Value

Description

classify_variants classifies a set of C:G > T:A variants into deaminations or non-deaminations based on a series of relevant variant descriptors.

Usage

1
classify_variants(variant_descriptors, algorithm = "RF")

Arguments

variant_descriptors

tibble containing the variants to be classified together with their values for a series of descriptors obtained by get_descriptors.

algorithm

character string naming the algorithm to use to classify the variants. Can be "RF" or "XGBoost". Defaults to "RF".

Details

classify_variants takes as an input variant_descriptors, which is a tibble created after calling the function get_descriptors. This tibble contains, for a collection of C:G > T:A variants, the values for a series of descriptors that have shown to be relevant for the classification of these type of variants into deaminations or non-deaminations. See the documentation of get_descriptors for more details on these descriptors.

classify_variants also takes the name of the algorithm to be used to classify the variants with the algorithm argument. Valid values are "RF" and "XGBoost", but case is irrelevant (value is case-insensitive). If an invalid algorithm value is provided, an error-message is displayed and the process is stopped.

Value

Tibble with six columns: CHROM, POS, REF, ALT, DEAM_SCORE, DEAMINATION. CHROM and POS identify the variant position, REF and ALT describe the reference and alternate alleles. DEAM_SCORE equals to the deamination score yielded by the selected classification algorithm (RF or XGBoost). Note that these values should not be interpreted as ordinary probabilities. DEAMINATION contains the label ideafix has assigned to the variant based on an optimized classification threshold.


mmaitenat/ideafix documentation built on Sept. 18, 2021, 7:55 a.m.