FindMaxRepeatDel: Return the number of repeat units in which a deletion is...

View source: R/ID_functions.R

FindMaxRepeatDelR Documentation

Return the number of repeat units in which a deletion is embedded

Description

Return the number of repeat units in which a deletion is embedded

Usage

FindMaxRepeatDel(context, rep.unit.seq, pos)

Arguments

context

A string that embeds rep.unit.seq at position pos

rep.unit.seq

A substring of context at pos to pos + nchar(rep.unit.seq) - 1, which is the repeat unit sequence.

pos

The position of rep.unit.seq in context.

Details

This function is primarily for internal use, but we export it to document the underlying logic.

For example FindMaxRepeatDel("xyaczt", "ac", 3) returns 0.

If substr(context, pos, pos + nchar(rep.unit.seq) - 1) != rep.unit.seq then stop.

If this functions returns 0, then it is necessary to look for microhomology using the function FindDelMH.

Warning
This function depends on the variant caller having "aligned" the deletion within the context of the repeat.

For example, a deletion of CAG in the repeat

GTCAGCAGCATGT

can have 3 "aligned" representations as follows:

CT---CAGCAGGT
CTCAG---CAGGT
CTCAGCAG---GT

In these cases this function will return 2. (Please not that the return value does not include the rep.uni.seq in the count.)

However, the same deletion can also have an "unaligned" representation, such as

CTCAGC---AGGT

(a deletion of AGC).

In this case this function will return 1 (a deletion of AGC in a 2-element repeat of AGC).

Value

The number of repeat units in which rep.unit.seq is embedded, not including the input rep.unit.seq in the count.

ID classification

See https://github.com/steverozen/ICAMS/blob/master/data-raw/PCAWG7_indel_classification_2021_09_03.xlsx for additional information on ID (small insertion and deletion) mutation classification.

See the documentation for Canonicalize1Del which first handles deletions in homopolymers, then handles deletions in simple repeats with longer repeat units, (e.g. CACACACA, see FindMaxRepeatDel), and if the deletion is not in a simple repeat, looks for microhomology (see FindDelMH).

See the code for unexported function CanonicalizeID and the functions it calls for handling of insertions.

Examples

FindMaxRepeatDel("xyACACzt", "AC", 3) # 1
FindMaxRepeatDel("xyACACzt", "CA", 4) # 0


ICAMS documentation built on June 22, 2024, 6:47 p.m.