Description Usage Arguments Format Details Author(s) References See Also Examples
Predefined substitution matrices for nucleotide and amino acid alignments.
1 2 3 4 5 6 7 8 9 10 11 12 13 | data(BLOSUM45)
data(BLOSUM50)
data(BLOSUM62)
data(BLOSUM80)
data(BLOSUM100)
data(PAM30)
data(PAM40)
data(PAM70)
data(PAM120)
data(PAM250)
nucleotideSubstitutionMatrix(match = 1, mismatch = 0, baseOnly = FALSE, type = "DNA")
qualitySubstitutionMatrices(fuzzyMatch = c(0, 1), alphabetLength = 4L, qualityClass = "PhredQuality", bitScale = 1)
errorSubstitutionMatrices(errorProbability, fuzzyMatch = c(0, 1), alphabetLength = 4L, bitScale = 1)
|
match |
the scoring for a nucleotide match. |
mismatch |
the scoring for a nucleotide mismatch. |
baseOnly |
|
type |
either "DNA" or "RNA". |
fuzzyMatch |
a named or unnamed numeric vector representing the base match probability. |
errorProbability |
a named or unnamed numeric vector representing the error probability. |
alphabetLength |
an integer representing the number of letters in the underlying string alphabet. For DNA and RNA, this would be 4L. For Amino Acids, this could be 20L. |
qualityClass |
a character string of |
bitScale |
a numeric value to scale the quality-based substitution matrices. By default, this is 1, representing bit-scale scoring. |
The BLOSUM and PAM matrices are square symmetric matrices with integer coefficients, whose row and column names are identical and unique: each name is a single letter representing a nucleotide or an amino acid.
nucleotideSubstitutionMatrix
produces a substitution matrix for all IUPAC
nucleic acid codes based upon match and mismatch parameters.
errorSubstitutionMatrices
produces a two element list of numeric
square symmetric matrices, one for matches and one for mismatches.
qualitySubstitutionMatrices
produces the substitution matrices
for Phred or Solexa quality-based reads.
The BLOSUM and PAM matrices are not unique. For example, the definition of the widely used BLOSUM62 matrix varies depending on the source, and even a given source can provide different versions of "BLOSUM62" without keeping track of the changes over time. NCBI provides many matrices here ftp://ftp.ncbi.nih.gov/blast/matrices/ but their definitions don't match those of the matrices bundled with their stand-alone BLAST software available here ftp://ftp.ncbi.nih.gov/blast/
The BLOSUM45, BLOSUM62, BLOSUM80, PAM30 and PAM70 matrices were taken from NCBI stand-alone BLAST software.
The BLOSUM50, BLOSUM100, PAM40, PAM120 and PAM250 matrices were taken from ftp://ftp.ncbi.nih.gov/blast/matrices/
The quality matrices computed in qualitySubstitutionMatrices
are
based on the paper by Ketil Malde. Let ε_i be the probability
of an error in the base read. For "Phred"
quality measures Q
in [0, 99], these error probabilities are given by
ε_i = 10^{-Q/10}. For "Solexa"
quality measures
Q in [-5, 99], they are given by
ε_i = 1 - 1/(1 + 10^{-Q/10}).
Assuming independence within and between base reads, the combined error
probability of a mismatch when the underlying bases do match is
ε_c = ε_1 + ε_2 - (n/(n-1)) * ε_1 * ε_2,
where n is the number of letters in the underlying alphabet. Using
ε_c, the substitution score is given by when two bases match is given by
b * \log_2(γ_{x,y} * (1 - ε_c) * n + (1 - γ_{x,y}) * ε_c * (n/(n-1))),
where b is the bit-scaling for the scoring and γ_{x,y} is the
probability that characters x and y represents the same underlying
information (e.g. using IUPAC, γ_{A,A} = 1 and γ_{A,N} = 1/4.
In the arguments listed above fuzzyMatch
represents γ_{x,y}
and errorProbability
represents ε_i.
H. Pag<c3><a8>s and P. Aboyoun
K. Malde, The effect of sequence quality on sequence alignment, Bioinformatics, Feb 23, 2008.
pairwiseAlignment
,
PairwiseAlignments-class,
DNAString-class,
AAString-class,
PhredQuality-class,
SolexaQuality-class,
IlluminaQuality-class
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | s1 <-
DNAString("ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAG")
s2 <-
DNAString("GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC")
## Fit a global pairwise alignment using edit distance scoring
pairwiseAlignment(s1, s2,
substitutionMatrix = nucleotideSubstitutionMatrix(0, -1, TRUE),
gapOpening = 0, gapExtension = 1)
## Examine quality-based match and mismatch bit scores for DNA/RNA
## strings in pairwiseAlignment.
## By default patternQuality and subjectQuality are PhredQuality(22L).
qualityMatrices <- qualitySubstitutionMatrices()
qualityMatrices["22", "22", "1"]
qualityMatrices["22", "22", "0"]
pairwiseAlignment(s1, s2)
## Get the substitution scores when the error probability is 0.1
subscores <- errorSubstitutionMatrices(errorProbability = 0.1)
submat <- matrix(subscores[,,"0"], 4, 4)
diag(submat) <- subscores[,,"1"]
dimnames(submat) <- list(DNA_ALPHABET[1:4], DNA_ALPHABET[1:4])
submat
pairwiseAlignment(s1, s2, substitutionMatrix = submat)
## Align two amino acid sequences with the BLOSUM62 matrix
aa1 <- AAString("HXBLVYMGCHFDCXVBEHIKQZ")
aa2 <- AAString("QRNYMYCFQCISGNEYKQN")
pairwiseAlignment(aa1, aa2, substitutionMatrix = "BLOSUM62", gapOpening = 3, gapExtension = 1)
## See how the gap penalty influences the alignment
pairwiseAlignment(aa1, aa2, substitutionMatrix = "BLOSUM62", gapOpening = 6, gapExtension = 2)
## See how the substitution matrix influences the alignment
pairwiseAlignment(aa1, aa2, substitutionMatrix = "BLOSUM50", gapOpening = 3, gapExtension = 1)
if (interactive()) {
## Compare our BLOSUM62 with BLOSUM62 from ftp://ftp.ncbi.nih.gov/blast/matrices/
data(BLOSUM62)
BLOSUM62["Q", "Z"]
file <- "ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62"
b62 <- as.matrix(read.table(file, check.names=FALSE))
b62["Q", "Z"]
}
|
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colMeans, colSums, colnames, do.call,
duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
setdiff, sort, table, tapply, union, unique, unsplit, which,
which.max, which.min
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following object is masked from 'package:base':
expand.grid
Loading required package: IRanges
Loading required package: XVector
Attaching package: 'Biostrings'
The following object is masked from 'package:base':
strsplit
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] ACTTCACCAGCTCCCTGGCGG-TAAGTTGATC-A-AAGGA-A-ACGCA-A-AGTTTTCAAG
subject: [1] GTTTCACTA-CTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC
score: -25
[1] 1.981756
[1] -5.899286
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] ACTTCACCAGCTCCCTGGCGGTAAGTTGATC---AAAGG---AAACGCAAAGTTTTCAAG
subject: [1] GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC
score: -102.487
A C G T
A 1.701919 -2.006426 -2.006426 -2.006426
C -2.006426 1.701919 -2.006426 -2.006426
G -2.006426 -2.006426 1.701919 -2.006426
T -2.006426 -2.006426 -2.006426 1.701919
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] ACTTCACCAGCTCCCTGGCGGTAAGTTGAT------CAAAGGAAACGCAAAGTTTTCAAG
subject: [1] GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC
score: -27.38831
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] HXBLVYMGCHFDCXV--BEHIKQZ
subject: [1] QRN--YMYC-FQC-ISGNEY-KQN
score: 38
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] HXBLVYMGCHFDCXVBEHIKQZ
subject: [1] QRN--YMYC-FQCISGNEYKQN
score: 23
Global PairwiseAlignmentsSingleSubject (1 of 1)
pattern: [1] HXBLVYMGCHFDCXV--BEHIKQZ
subject: [1] QRN--YMYC-FQC-ISGNEY-KQN
score: 57
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.