errorFinder: Find sequencing errors
In florian0512/sarlacc: Pipeline for Oxford Nanopore RNA-Seq Data Analysis

Description Usage Arguments Details Value Author(s) See Also Examples

Find errors in one or more read sequences compared to a single reference sequence, based on pairwise alignments between them.

1	errorFinder(alignments)

alignments

A Global PairwiseAlignmentsSingleSubject object, usually produced by pairwiseAlignment. The subject should be a constant reference sequence.

For each position in the reference sequence, this function records the observed frequency of each base across all aligned read sequences. A transition matrix is computed describing the number of times a base in the reference sequence (rows) is observed as a base in the read sequence (column).

Insertions in the read sequence are assigned to the base position that they precede in the reference sequence. So an alignment like:

1 2	AAAATGGGG # Read AAA--GGGG # Reference

would yield an insertion of length 2 that is assigned to base position 4 on the reference sequence. Insertions at the end of the sequence are assigned to an “imaginary” base position at one plus the sequence length.

We also record the observed frequency of deletions for each position in the reference.

A list is returned containing a DataFrame in full and a transition matrix in transition.

In the full DataFrame, each row represents a position on the reference sequence. The DataFrame contains the fields:

base:: Character, the base at the current position.
A,C,T,G:: Integer, the frequency of observing A, C, T or G at the current position.
deletion:: Integer, the frequency of observing a deletion at the current position.
insertion:: RleList, where the entry for each position is an integer run-length encoding object. This contains a distribution of lengths of insertions immediately preceding the current position.

The last row corresponds to a hypothetical one-past-the-end position, and is NA for all fields except for insertion (i.e., when an insertion occurs at the sequence end).

Aaron Lun, with contributions from Cheuk-Ting Law

pairwiseAlignment

1
2
3

aln <- pairwiseAlignment(subject=DNAString(c("AAACGATCAGCTACGAACACT")), 
    DNAStringSet(c("AACGAGGGCCACCTAGGAAGAAT", "AACCAATCCAGCTACGCAACGACT")))
errorFinder(aln)