Description Usage Arguments Details Value Author(s) See Also Examples
Match the true and observed homopolymer lengths based on pairwise alignments of read sequences to a reference.
1 | homopolymerMatcher(alignments)
|
alignments |
A Global PairwiseAlignmentsSingleSubject object, usually produced by |
This function will identify all “true” homopolymers in the reference sequence (i.e., runs of the same base). For each true homopolymer, it will identify the corresponding read subsequence based on the pairwise alignment. The observed length of the true homopolymer is defined as the longest contiguous run of the same base (ignoring deletion characters) in the corresponding read subsequence.
This is most easily illustrated with a few examples below. For demonstration purposes, only the true homopolymer region and the corresponding read subsequence are shown in uppercase.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | # Observed length of 2.
acgtAA--tgca # Read
acgtAAAAtgca # Reference
# Observed length of 3, despite the deletion character.
acgtAA-Atgca # Read
acgtAAAAtgca # Reference
# Observed length of 2, as the T breaks the run of A's.
acgtAATAtgca # Read
acgtAAAAtgca # Reference
# Observed length of 3, before the breaking T.
acgtAAATAtgca # Read
acgt-AAAAtgca # Reference
# Observed length of 6, including the insertions before and after.
acgtAAAAAAtgca # Read
acgt-AAAA-tgca # Reference
# Observed length of 4, as the observed run must overlap actual homopolymer bases.
acgtAAAATAAAAAAAAAtgca # Read
acgtAAAA----------tgca # Reference
|
An IRanges object where each entry represents a homopolymer run in the reference sequence.
The metadata contains base
, the base identity of the homopolymer;
and observed
, a RleList containing an integer run length encoding for each homopolymer.
The integer Rle contains the distribution of the observed lengths of that homopolymer in the read sequence.
Aaron Lun, with contributions from Cheuk-Ting Law
pairwiseAlignment
,
homopolymerFinder
1 2 3 | aln <- pairwiseAlignment(subject=DNAString(c("AAAAGGGGGCCCCTTTT")),
DNAStringSet(c("AAAAAGGGGGCCCCCCTTTTT", "AAAAGGGGGCCCCTTTTT")))
homopolymerMatcher(aln)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.