homopolymerMatcher: Match homopolymer lengths

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Match the true and observed homopolymer lengths based on pairwise alignments of read sequences to a reference.

Usage

1
homopolymerMatcher(alignments) 

Arguments

alignments

A Global PairwiseAlignmentsSingleSubject object, usually produced by pairwiseAlignment. The subject should be a constant reference sequence.

Details

This function will identify all “true” homopolymers in the reference sequence (i.e., runs of the same base). For each true homopolymer, it will identify the corresponding read subsequence based on the pairwise alignment. The observed length of the true homopolymer is defined as the longest contiguous run of the same base (ignoring deletion characters) in the corresponding read subsequence.

This is most easily illustrated with a few examples below. For demonstration purposes, only the true homopolymer region and the corresponding read subsequence are shown in uppercase.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
  # Observed length of 2.
  acgtAA--tgca # Read
  acgtAAAAtgca # Reference

  # Observed length of 3, despite the deletion character.
  acgtAA-Atgca # Read
  acgtAAAAtgca # Reference

  # Observed length of 2, as the T breaks the run of A's.
  acgtAATAtgca # Read
  acgtAAAAtgca # Reference

  # Observed length of 3, before the breaking T.
  acgtAAATAtgca # Read
  acgt-AAAAtgca # Reference

  # Observed length of 6, including the insertions before and after.
  acgtAAAAAAtgca # Read
  acgt-AAAA-tgca # Reference

  # Observed length of 4, as the observed run must overlap actual homopolymer bases.
  acgtAAAATAAAAAAAAAtgca # Read
  acgtAAAA----------tgca # Reference

Value

An IRanges object where each entry represents a homopolymer run in the reference sequence. The metadata contains base, the base identity of the homopolymer; and observed, a RleList containing an integer run length encoding for each homopolymer. The integer Rle contains the distribution of the observed lengths of that homopolymer in the read sequence.

Author(s)

Aaron Lun, with contributions from Cheuk-Ting Law

See Also

pairwiseAlignment, homopolymerFinder

Examples

1
2
3
aln <- pairwiseAlignment(subject=DNAString(c("AAAAGGGGGCCCCTTTT")), 
    DNAStringSet(c("AAAAAGGGGGCCCCCCTTTTT", "AAAAGGGGGCCCCTTTTT")))
homopolymerMatcher(aln)

florian0512/sarlacc documentation built on May 28, 2019, 8:39 p.m.