pairwiseAlignment: Optimal Pairwise Alignment
In Biostrings: Efficient manipulation of biological strings

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Solves (Needleman-Wunsch) global alignment, (Smith-Waterman) local alignment, and (ends-free) overlap alignment problems.

pairwiseAlignment(pattern, subject, ...)

## S4 method for signature 'ANY,ANY'
pairwiseAlignment(pattern, subject,
                  patternQuality=PhredQuality(22L),
                  subjectQuality=PhredQuality(22L),
                  type="global",
                  substitutionMatrix=NULL, fuzzyMatrix=NULL,
                  gapOpening=10, gapExtension=4,
                  scoreOnly=FALSE)

## S4 method for signature 'QualityScaledXStringSet,QualityScaledXStringSet'
pairwiseAlignment(pattern, subject,
                  type="global",
                  substitutionMatrix=NULL, fuzzyMatrix=NULL, 
                  gapOpening=10, gapExtension=4,
                  scoreOnly=FALSE)

`pattern`	a character vector of any length, an `XString`, or an `XStringSet` object.
`subject`	a character vector of length 1, an `XString`, or an `XStringSet` object of length 1.
`patternQuality, subjectQuality`	objects of class `XStringQuality` representing the respective quality scores for `pattern` and `subject` that are used in a quality-based method for generating a substitution matrix. These two arguments are ignored if `!is.null(substitutionMatrix)` or if its respective string set (`pattern`, `subject`) is of class `QualityScaledXStringSet`.
`type`	type of alignment. One of `"global"`, `"local"`, `"overlap"`, `"global-local"`, and `"local-global"` where `"global"` = align whole strings with end gap penalties, `"local"` = align string fragments, `"overlap"` = align whole strings without end gap penalties, `"global-local"` = align whole strings in `pattern` with consecutive subsequence of `subject`, `"local-global"` = align consecutive subsequence of `pattern` with whole strings in `subject`.
`substitutionMatrix`	substitution matrix representing the fixed substitution scores for an alignment. It cannot be used in conjunction with `patternQuality` and `subjectQuality` arguments.
`fuzzyMatrix`	fuzzy match matrix for quality-based alignments. It takes values between 0 and 1; where 0 is an unambiguous mismatch, 1 is an unambiguous match, and values in between represent a fraction of "matchiness". (See details section below.)
`gapOpening`	the cost for opening a gap in the alignment.
`gapExtension`	the incremental cost incurred along the length of the gap in the alignment.
`scoreOnly`	logical to denote whether or not to return just the scores of the optimal pairwise alignment.
`...`	optional arguments to generic function to support additional methods.

Quality-based alignments are based on the paper the Bioinformatics article by Ketil Malde listed in the Reference section below. Let ε_i be the probability of an error in the base read. For "Phred" quality measures Q in [0, 99], these error probabilities are given by ε_i = 10^{-Q/10}. For "Solexa" quality measures Q in [-5, 99], they are given by ε_i = 1 - 1/(1 + 10^{-Q/10}). Assuming independence within and between base reads, the combined error probability of a mismatch when the underlying bases do match is ε_c = ε_1 + ε_2 - (n/(n-1)) * ε_1 * ε_2, where n is the number of letters in the underlying alphabet (i.e. n = 4 for DNA input, n = 20 for amino acid input, otherwise n is the number of distinct letters in the input). Using ε_c, the substitution score is given by b * \log_2(γ_{x,y} * (1 - ε_c) * n + (1 - γ_{x,y}) * ε_c * (n/(n-1))), where b is the bit-scaling for the scoring and γ_{x,y} is the probability that characters x and y represents the same underlying information (e.g. using IUPAC, γ_{A,A} = 1 and γ_{A,N} = 1/4. In the arguments listed above fuzzyMatch represents γ_{x,y} and patternQuality and subjectQuality represents ε_1 and ε_2 respectively.

If scoreOnly == FALSE, a pairwise alignment with the maximum alignment score is returned. If more than one pairwise alignment produces the maximum alignment score, then the alignment with the smallest initial deletion whose mismatches occur before its insertions and deletions is chosen. For example, if pattern = "AGTA" and subject = "AACTAACTA", then the alignment pattern: [1] AG-TA; subject: [1] AACTA is chosen over pattern: [1] A-GTA; subject: [1] AACTA or pattern: [1] AG-TA; subject: [5] AACTA if they all achieve the maximum alignment score.

If scoreOnly == FALSE, an instance of class PairwiseAlignments or PairwiseAlignmentsSingleSubject is returned. If scoreOnly == TRUE, a numeric vector containing the scores for the optimal pairwise alignments is returned.

Use matchPattern or vmatchPattern if you need to find all the occurrences (eventually with indels) of a given pattern in a reference sequence or set of sequences.

Use matchPDict if you need to match a (big) set of patterns against a reference sequence.

P. Aboyoun and H. Pag<c3><a8>s

R. Durbin, S. Eddy, A. Krogh, G. Mitchison, Biological Sequence Analysis, Cambridge UP 1998, sec 2.3.

B. Haubold, T. Wiehe, Introduction to Computational Biology, Birkhauser Verlag 2006, Chapter 2.

K. Malde, The effect of sequence quality on sequence alignment, Bioinformatics 2008 24(7):897-900.

writePairwiseAlignments, stringDist, PairwiseAlignments-class, XStringQuality-class, substitution.matrices, matchPattern

  ## Nucleotide global, local, and overlap alignments
  s1 <- 
    DNAString("ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAG")
  s2 <-
    DNAString("GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC")

  # First use a fixed substitution matrix
  mat <- nucleotideSubstitutionMatrix(match = 1, mismatch = -3, baseOnly = TRUE)
  globalAlign <-
    pairwiseAlignment(s1, s2, substitutionMatrix = mat,
                      gapOpening = 5, gapExtension = 2)
  localAlign <-
    pairwiseAlignment(s1, s2, type = "local", substitutionMatrix = mat,
                      gapOpening = 5, gapExtension = 2)
  overlapAlign <-
    pairwiseAlignment(s1, s2, type = "overlap", substitutionMatrix = mat,
                      gapOpening = 5, gapExtension = 2)

  # Then use quality-based method for generating a substitution matrix
  pairwiseAlignment(s1, s2,
                    patternQuality = SolexaQuality(rep(c(22L, 12L), times = c(36, 18))),
                    subjectQuality = SolexaQuality(rep(c(22L, 12L), times = c(40, 20))),
                    scoreOnly = TRUE)

  # Now assume can't distinguish between C/T and G/A
  pairwiseAlignment(s1, s2,
                    patternQuality = SolexaQuality(rep(c(22L, 12L), times = c(36, 18))),
                    subjectQuality = SolexaQuality(rep(c(22L, 12L), times = c(40, 20))),
                    type = "local")
  mapping <- diag(4)
  dimnames(mapping) <- list(DNA_BASES, DNA_BASES)
  mapping["C", "T"] <- mapping["T", "C"] <- 1
  mapping["G", "A"] <- mapping["A", "G"] <- 1
  pairwiseAlignment(s1, s2,
                    patternQuality = SolexaQuality(rep(c(22L, 12L), times = c(36, 18))),
                    subjectQuality = SolexaQuality(rep(c(22L, 12L), times = c(40, 20))),
                    fuzzyMatrix = mapping,
                    type = "local")

  ## Amino acid global alignment
  pairwiseAlignment(AAString("PAWHEAE"), AAString("HEAGAWGHEE"),
                    substitutionMatrix = "BLOSUM50",
                    gapOpening = 0, gapExtension = 8)

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit

[1] -77.29538
Local PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [20] GGTAAGT 
subject: [20] GGTAAGT 
score: 13.8731 
Local PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] ACTTCACCAGCTCCCT 
subject: [1] GTTTCACTACTTCCTT 
score: 23.81974 
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] PA--W-HEAE 
subject: [2] EAGAWGHE-E 
score: 1 
Warning message:
system call failed: Cannot allocate memory

Biostrings documentation built on Nov. 8, 2020, 11:12 p.m.

Biostrings index

README.md A short presentation of the basic classes defined in Biostrings 2 Biostrings Quick Overview Handling probe sequence information Multiple Alignments Pairwise Sequence Alignments

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Biostrings
Efficient manipulation of biological strings

pairwiseAlignment: Optimal Pairwise Alignment
In Biostrings: Efficient manipulation of biological strings

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Example output

Related to pairwiseAlignment in Biostrings...

R Package Documentation

Browse R Packages

We want your feedback!

Biostrings Efficient manipulation of biological strings

pairwiseAlignment: Optimal Pairwise Alignment In Biostrings: Efficient manipulation of biological strings

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Example output

Related to pairwiseAlignment in Biostrings...

R Package Documentation

Browse R Packages

We want your feedback!

Biostrings
Efficient manipulation of biological strings

pairwiseAlignment: Optimal Pairwise Alignment
In Biostrings: Efficient manipulation of biological strings