spoaAlign: Specify an algorithm for SPOA

Description Usage Arguments Details Value Examples

View source: R/spoar.R

Description

Specify an algorithm for SPOA

Align sequences using SPOA (and optionally get the consensus)

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
SpoaAlgo(
  algorithm = c("local", "global", "semi.global"),
  m = 5L,
  n = -4L,
  g = -8L,
  e = g,
  q = g,
  c = q
)

spoaAlign(seq, algo = SpoaAlgo(), ...)

## S3 method for class 'character'
spoaAlign(seq, algo = SpoaAlgo(), w = 1, qual = NULL, ...)

## S3 method for class 'XStringSet'
spoaAlign(seq, algo = SpoaAlgo(), w = 1, qual = NULL, ...)

## S3 method for class 'QualityScaledXStringSet'
spoaAlign(seq, algo = SpoaAlgo(), w = 1, ...)

## S3 method for class 'ShortReadQ'
spoaAlign(seq, algo = SpoaAlgo(), w = 1, ...)

## S3 method for class 'derep'
spoaAlign(seq, algo = SpoaAlgo(), ...)

spoaConsensus(seq, algo = SpoaAlgo(), ...)

## S3 method for class 'character'
spoaConsensus(seq, algo = SpoaAlgo(), w = 1, qual = NULL, ...)

## S3 method for class 'XStringSet'
spoaConsensus(seq, algo = SpoaAlgo(), w = 1, qual = NULL, ...)

## S3 method for class 'ShortRead'
spoaConsensus(seq, algo = SpoaAlgo(), w = 1, qual = NULL, ...)

## S3 method for class 'QualityScaledXStringSet'
spoaConsensus(seq, algo = SpoaAlgo(), w = 1, ...)

## S3 method for class 'ShortReadQ'
spoaConsensus(seq, algo = SpoaAlgo(), w = 1, ...)

## S3 method for class 'derep'
spoaConsensus(seq, algo = SpoaAlgo(), ...)

Arguments

algorithm

(character string) alignment mode; one of "local" (Smith-Watterman), "global" (Needleman-Wunsch), or "semi.global" (Overlap). Default: "local"

m

(non-negative integer) score for a match. Default: 5L

n

(non-positive integer) score for a mismatch. Default: -4L

g

(non-positive integer) gap opening penalty. Default: -8L

e

(non-positive integer) gap extension penalty. Default: same value as g

q

(non-positive integer) second gap opening penalty. Default: same value as g

c

(non-positive integer) second gap extension penalty. Default: same value as e

seq

(character vector, Biostrings::XStringSet, ShortRead::ShortRead, or dada2::derep) sequences to align.

algo

(SpoaAlgo object) parameters for SPOA, generated by SpoaAlgo().

...

additional parameters passed to methods

w

(integer vector) weights for the sequences. Default: 1L

qual

(list of numeric vectors) per-base quality scores for the sequences. Default: none

Details

Gap penalties

The general (convex) gap penalty formula is:

min(g + (i - i) * e, q + (i - 1) * c)

For q == g and c == e (as is the case unless q or c is explicitly set), this simplifies to the affine gap penalty:

g + (i - 1) * e

If, additionally, e == g (the default), then this is the linear gap penalty:

g * i

Weighting

Assigning a weight w to a sequence is equivalent to including that sequence w times; this is primarily useful for dereplicated sequences, where all sequences are unique, and the weight represents their multiplicity.

Note that POA is not order-independent, so results may differ when a set of non-unique sequences is dereplicated. For the most predictable results, it is recommended to sort dereplicated sequences in decreasing order of abundance (as in dada2::derepFastq()).

Quality scores

If seq is an object which includes quality scores (Biostrings::QualityScaledXStringSet, ShortRead::ShortReadQ, or dada2::derep) then the alignment consensus is weighted using the quality scores. The method used by SPOA uses the integer quality scores as per-position weights analogous to sequence weights; i.e., if two different bases align in the same position, one in a single sequence with quality score 40 and the other in a single sequence with quality score 20, then this is equivalent to the first base occurring at that position in 40 sequences without quality scores, and the second base occurring at that position in 20 sequences without quality scores.

It is possible to use a combination of sequence weights and quality scores. In this case, in order for a dereplicated sequence set to give the same results as the non-dereplicated sequences, the quality score for each of the dereplicated sequences should be the mean of the quality scores for the corresponding non-dereplicated sequences. This is the method used to generate quality scores for dereplicated sequences in dada2::derepFastq().

Value

an object of class "SpoaAlgo", suitable for passing to spoaAlign() or spoaConsensus()

For spoaConsensus(), either a character string or the appropriate Biostrings::XString, depending on the class of seq.

For spoaAlign(), either a character vector or a Biostrings::MultipleAlignment, depending on the class of seq. If seq is a BStringSet (i.e., an XStringSet which is not specifically DNA, RNA, or AA) then the result is also a BStringSet, since there is no corresponding "BMultipleAlignment" class. If seq is a ShortRead::ShortRead object, the output is a Biostrings::DNAMultipleAlignment.

seq spoaConsensus spoaAlign
character(n) character(1) character(n)
BStringSet BString BStringSet
DNAStringSet DNAString DNAMultipleAlignment
RNAStringSet RNAString RNAMultipleAlignment
AAStringSet AAString AAMultipleAlignment
QualityScaledBStringSet BString BStringSet
QualityScaledDNAStringSet DNAString DNAMultipleAlignment
QualityScaledRNAStringSet RNAString RNAMultipleAlignment
QualityScaledAAStringSet AAString AAMultipleAlignment
ShortReadQ DNAString DNAMultipleAlignment
derep character(1) character(n)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
sequences <- c(
    "CATAAAAGAACGTAGGTCGCCCGTCCGTAACCTGTCGGATCACCGGAAAGGACCCGTAAAGTGATAATGAT",
    "ATAAAGGCAGTCGCTCTGTAAGCTGTCGATTCACCGGAAAGATGGCGTTACCACGTAAAGTGATAATGATTAT",
    "ATCAAAGAACGTGTAGCCTGTCCGTAATCTAGCGCATTTCACACGAGACCCGCGTAATGGG",
    "CGTAAATAGGTAATGATTATCATTACATATCACAACTAGGGCCGTATTAATCATGATATCATCA",
    "GTCGCTAGAGGCATCGTGAGTCGCTTCCGTACCGCAAGGATGACGAGTCACTTAAAGTGATAAT",
    "CCGTAACCTTCATCGGATCACCGGAAAGGACCCGTAAATAGACCTGATTATCATCTACAT"
)
spoaAlign(sequences)
spoaConsensus(sequences)

brendanf/spoar documentation built on Dec. 19, 2021, 11:43 a.m.