attribution_of_indels: Attribution of variant into one onf the 83 INDEL categories

Description Usage Arguments Value Examples

View source: R/indel_functions.R

Description

Each varaint is categorized into one of the 83 INDEL categories. The classification likewise to Alexandrov et al., 2018 (https://www.synapse.org/#!Synapse:syn11726616). The number of 83 features are classefied asfollowed:

  1. Deletion of 1 bp C/(G) or T/(A) in a repetitive context. The context is classified into 1, 2, 3, 4, 5 or larger or equal to 6 times the same nucleotide(s).

  2. Insertion of 1 bp C/(G) or T/(A) in a repetitive context. The context is classified into 0, 1, 2, 3, 4, or larger or equal to 5 times the same nucleotide(s).

  3. Deletions of 2bps, 3bps, 4bps or more or equal to 5bps in a repetitive context. Each deletion is classified in a context of 1, 2, 3, 4, 5 or larger or equal to 6 times the same motif.

  4. Insertion of 2 bps, 3 bps, 4 bps or more or equal to 5 bps in a repetitive context. Each deletion is classified in a context of 0, 1, 2, 3, 4 or larger or equal to 5 times the same motif.

  5. Microhomology deletion of 2bps, 3bps, 4bps or more or equal to 5 bps in a partly repetitive context. The partly repetitive context is defined by motif length of minus 1 bp, 2 bps, 3 bps, 4 bps or more or equal to 5bps, which is located before and after the break-point junction of the deletion.

Usage

1
attribution_of_indels(in_dat_return = in_dat_return)

Arguments

in_dat_return

Data frame constucted form a vcf-like file of a whole cohort or a single-sample.The first columns are those of a standart vcf file, followed by an abitrary number of custom or defined columns. One of these can carry a PID (patient or sample identifyer) and subgroup information. Furthermore, the columns containing the sequence context and the absolute length of the INDEL as well as the INDEL type of the variant can be annotated to the vcf-like df with attribute_sequence_contex_indel. These columns are required to enable the constuction of a mutational catalog.

Value

Data frame with the same dimention as the input data frame plus an addional column with the INDEL classification number corrospondig to Alexandrov et al. 2018.

Examples

1
2
3
4
5
data(GenomeOfNl_raw)
GenomeOfNl_context <- attribute_sequence_contex_indel(in_dat =
head(GenomeOfNl_raw))
GenomeOfNl_classified <- attribution_of_indels(GenomeOfNl_context)
GenomeOfNl_classified

slw287r/yapsa documentation built on June 7, 2020, 12:46 a.m.