FindDelMH | R Documentation |
Return the length of microhomology at a deletion
FindDelMH(context, deleted.seq, pos, trace = 0, warn.cryptic = TRUE)
context |
The deleted sequence plus ample surrounding
sequence on each side (at least as long as |
deleted.seq |
The deleted sequence in |
pos |
The position of |
trace |
If > 0, then generate various messages showing how the computation is carried out. |
warn.cryptic |
if |
This function is primarily for internal use, but we export it to document the underlying logic.
Example:
GGCTAGTT
aligned to GGCTAGAACTAGTT
with
a deletion represented as:
GGCTAGAACTAGTT GG------CTAGTT GGCTAGTT GG[CTAGAA]CTAGTT ---- ----
Presumed repair mechanism leading to this:
.... GGCTAGAACTAGTT CCGATCTTGATCAA => .... GGCTAG TT CC GATCAA .... => GGCTAGTT CCGATCAA
Variant-caller software can represent the same deletion in several different, but completely equivalent, ways.
GGC------TAGTT GGCTAGTT GGC[TAGAAC]TAGTT * --- * --- GGCT------AGTT GGCTAGTT GGCT[AGAACT]AGTT ** -- ** -- GGCTA------GTT GGCTAGTT GGCTA[GAACTA]GTT *** - *** - GGCTAG------TT GGCTAGTT GGCTAG[AACTAG]TT **** ****
This function finds:
The maximum match of undeleted sequence to the left of the deletion that is identical to the right end of the deleted sequence, and
The maximum match of undeleted sequence to the right of the deletion that is identical to the left end of the deleted sequence.
The microhomology sequence is the concatenation of items (1) and (2).
Warning
A deletion in a repeat can also be represented
in several different ways. A deletion in a repeat
is abstractly equivalent to a deletion with microhomology that
spans the entire deleted sequence. For example;
GACTAGCTAGTT GACTA----GTT GACTAGTT GACTA[GCTA]GTT *** -*** -
is really a repeat
GACTAG----TT GACTAGTT GACTAG[CTAG]TT **** ---- GACT----AGTT GACTAGTT GACT[AGCT]AGTT ** --** --
This function only flags these "cryptic repeats" with a -1 return; it does not figure out the repeat extent.
The length of the maximum microhomology of del.sequence
in context
.
See https://github.com/steverozen/ICAMS/blob/master/data-raw/PCAWG7_indel_classification_2021_09_03.xlsx for additional information on ID (small insertion and deletion) mutation classification.
See the documentation for Canonicalize1Del
which first handles
deletions in homopolymers, then handles deletions in simple repeats with
longer repeat units, (e.g. CACACACA
, see
FindMaxRepeatDel
), and if the deletion is not in a simple
repeat, looks for microhomology (see FindDelMH
).
See the code for unexported function CanonicalizeID
and the functions it calls for handling of insertions.
# GAGAGG[CTAGAA]CTAGTT
# ---- ----
FindDelMH("GGAGAGGCTAGAACTAGTTAAAAA", "CTAGAA", 8, trace = 0) # 4
# A cryptic repeat
#
# TAAATTATTTATTAATTTATTG
# TAAATTA----TTAATTTATTG = TAAATTATTAATTTATTG
#
# equivalent to
#
# TAAATTATTTATTAATTTATTG
# TAAAT----TATTAATTTATTG = TAAATTATTAATTTATTG
#
# and
#
# TAAATTATTTATTAATTTATTG
# TAAA----TTATTAATTTATTG = TAAATTATTAATTTATTG
FindDelMH("TAAATTATTTATTAATTTATTG", "TTTA", 8, warn.cryptic = FALSE) # -1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.