align-utils: Utility functions related to sequence alignment
In Biostrings: Efficient manipulation of biological strings

Description Usage Arguments Details See Also Examples

A variety of different functions used to deal with sequence alignments.

nedit(x) # also nmatch and nmismatch

mismatchTable(x, shiftLeft=0L, shiftRight=0L, ...)
mismatchSummary(x, ...)
## S4 method for signature 'AlignedXStringSet0'
coverage(x, shift=0L, width=NULL, weight=1L)
## S4 method for signature 'PairwiseAlignmentsSingleSubject'
coverage(x, shift=0L, width=NULL, weight=1L)
compareStrings(pattern, subject)

## S4 method for signature 'PairwiseAlignmentsSingleSubject'
consensusMatrix(x,
                as.prob=FALSE, shift=0L, width=NULL,
                baseOnly=FALSE, gapCode="-", endgapCode="-")

`x`	A `character` vector or matrix, `XStringSet`, `XStringViews`, `PairwiseAlignments`, or `list` of FASTA records containing the equal-length strings.
`shiftLeft, shiftRight`	Non-positive and non-negative integers respectively that specify how many preceding and succeeding characters to and from the mismatch position to include in the mismatch substrings.
`...`	Further arguments to be passed to or from other methods.
`shift, width`	See `?coverage`.
`weight`	An integer vector specifying how much each element in `x` counts.
`pattern, subject`	The strings to compare. Can be of type `character`, `XString`, `XStringSet`, `AlignedXStringSet`, or, in the case of `pattern`, `PairwiseAlignments`. If `pattern` is a `PairwiseAlignments` object, then `subject` must be missing.
`as.prob`	If `TRUE` then probabilities are reported, otherwise counts (the default).
`baseOnly`	`TRUE` or `FALSE`. If `TRUE`, the returned vector only contains frequencies for the letters in the "base" alphabet i.e. "A", "C", "G", "T" if `x` is a "DNA input", and "A", "C", "G", "U" if `x` is "RNA input". When `x` is a BString object (or an XStringViews object with a BString subject, or a BStringSet object), then the `baseOnly` argument is ignored.
`gapCode, endgapCode`	The codes in the appropriate `alphabet` to use for the internal and end gaps.

mismatchTable: a data.frame containing the positions and substrings of the mismatches for the AlignedXStringSet or PairwiseAlignments object.

mismatchSummary: a list of data.frame objects containing counts and frequencies of the mismatches for the AlignedXStringSet or PairwiseAlignmentsSingleSubject object.

compareStrings combines two equal-length strings that are assumed to be aligned into a single character string containing that replaces mismatches with "?", insertions with "+", and deletions with "-".

pairwiseAlignment, consensusMatrix, XString-class, XStringSet-class, XStringViews-class, AlignedXStringSet-class, PairwiseAlignments-class, match-utils

  ## Compare two globally aligned strings
  string1 <- "ACTTCACCAGCTCCCTGGCGGTAAGTTGATC---AAAGG---AAACGCAAAGTTTTCAAG"
  string2 <- "GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC"
  compareStrings(string1, string2)

  ## Create a consensus matrix
  nw1 <-
    pairwiseAlignment(AAStringSet(c("HLDNLKGTF", "HVDDMPNAL")), AAString("SMDDTEKMSMKL"),
      substitutionMatrix = "BLOSUM50", gapOpening = 3, gapExtension = 1)
  consensusMatrix(nw1)

  ## Examine the consensus between the bacteriophage phi X174 genomes
  data(phiX174Phage)
  phageConsmat <- consensusMatrix(phiX174Phage, baseOnly = TRUE)
  phageDiffs <- which(apply(phageConsmat, 2, max) < length(phiX174Phage))
  phageDiffs
  phageConsmat[,phageDiffs]

Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit

[1] "??TTCAC?A??TCC?T???GGTAAGT??AT?---AAA??---AAA???A?A?TTTTCA??"
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
-    0    0    0    0    2    2    2    1    1     0     0     0
A    0    0    0    0    0    0    0    0    0     0     1     0
D    0    0    2    1    0    0    0    0    0     0     0     0
F    0    0    0    0    0    0    0    0    0     0     0     1
H    2    0    0    0    0    0    0    0    0     0     0     0
K    0    0    0    0    0    0    0    0    0     0     1     0
L    0    1    0    0    0    0    0    0    0     1     0     1
M    0    0    0    0    0    0    0    1    0     0     0     0
N    0    0    0    1    0    0    0    0    0     1     0     0
P    0    0    0    0    0    0    0    0    1     0     0     0
V    0    1    0    0    0    0    0    0    0     0     0     0
[1]  587  833 1650 2731 2793 2811 3340 4518 4784
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
A        4    5    4    3    0    0    5    2    0
C        0    0    0    0    5    1    0    0    5
G        2    1    2    3    0    0    1    4    0
T        0    0    0    0    1    5    0    0    1
other    0    0    0    0    0    0    0    0    0