Description Usage Arguments Format Details Author(s) References See Also Examples
Predefined substitution matrices for nucleotide and amino acid alignments.
1 2 3 4 5 6 7 8 9 10 11 12 13 | data(BLOSUM45)
data(BLOSUM50)
data(BLOSUM62)
data(BLOSUM80)
data(BLOSUM100)
data(PAM30)
data(PAM40)
data(PAM70)
data(PAM120)
data(PAM250)
nucleotideSubstitutionMatrix(match = 1, mismatch = 0, baseOnly = FALSE, type = "DNA")
qualitySubstitutionMatrices(fuzzyMatch = c(0, 1), alphabetLength = 4L, qualityClass = "PhredQuality", bitScale = 1)
errorSubstitutionMatrices(errorProbability, fuzzyMatch = c(0, 1), alphabetLength = 4L, bitScale = 1)
|
match |
the scoring for a nucleotide match. |
mismatch |
the scoring for a nucleotide mismatch. |
baseOnly |
|
type |
either "DNA" or "RNA". |
fuzzyMatch |
a named or unnamed numeric vector representing the base match probability. |
errorProbability |
a named or unnamed numeric vector representing the error probability. |
alphabetLength |
an integer representing the number of letters in the underlying string alphabet. For DNA and RNA, this would be 4L. For Amino Acids, this could be 20L. |
qualityClass |
a character string of |
bitScale |
a numeric value to scale the quality-based substitution matrices. By default, this is 1, representing bit-scale scoring. |
The BLOSUM and PAM matrices are square symmetric matrices with integer coefficients, whose row and column names are identical and unique: each name is a single letter representing a nucleotide or an amino acid.
nucleotideSubstitutionMatrix
produces a substitution matrix for all IUPAC
nucleic acid codes based upon match and mismatch parameters.
errorSubstitutionMatrices
produces a two element list of numeric
square symmetric matrices, one for matches and one for mismatches.
qualitySubstitutionMatrices
produces the substitution matrices
for Phred or Solexa quality-based reads.
The BLOSUM and PAM matrices are not unique. For example, the definition of the widely used BLOSUM62 matrix varies depending on the source, and even a given source can provide different versions of "BLOSUM62" without keeping track of the changes over time. NCBI provides many matrices here ftp://ftp.ncbi.nih.gov/blast/matrices/ but their definitions don't match those of the matrices bundled with their stand-alone BLAST software available here ftp://ftp.ncbi.nih.gov/blast/
The BLOSUM45, BLOSUM62, BLOSUM80, PAM30 and PAM70 matrices were taken from NCBI stand-alone BLAST software.
The BLOSUM50, BLOSUM100, PAM40, PAM120 and PAM250 matrices were taken from ftp://ftp.ncbi.nih.gov/blast/matrices/
The quality matrices computed in qualitySubstitutionMatrices
are
based on the paper by Ketil Malde. Let ε_i be the probability
of an error in the base read. For "Phred"
quality measures Q
in [0, 99], these error probabilities are given by
ε_i = 10^{-Q/10}. For "Solexa"
quality measures
Q in [-5, 99], they are given by
ε_i = 1 - 1/(1 + 10^{-Q/10}).
Assuming independence within and between base reads, the combined error
probability of a mismatch when the underlying bases do match is
ε_c = ε_1 + ε_2 - (n/(n-1)) * ε_1 * ε_2,
where n is the number of letters in the underlying alphabet. Using
ε_c, the substitution score is given by when two bases match is given by
b * \log_2(γ_{x,y} * (1 - ε_c) * n + (1 - γ_{x,y}) * ε_c * (n/(n-1))),
where b is the bit-scaling for the scoring and γ_{x,y} is the
probability that characters x and y represents the same underlying
information (e.g. using IUPAC, γ_{A,A} = 1 and γ_{A,N} = 1/4.
In the arguments listed above fuzzyMatch
represents γ_{x,y}
and errorProbability
represents ε_i.
H. Pagès and P. Aboyoun
K. Malde, The effect of sequence quality on sequence alignment, Bioinformatics, Feb 23, 2008.
pairwiseAlignment
,
PairwiseAlignments-class,
DNAString-class,
AAString-class,
PhredQuality-class,
SolexaQuality-class,
IlluminaQuality-class
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | s1 <-
DNAString("ACTTCACCAGCTCCCTGGCGGTAAGTTGATCAAAGGAAACGCAAAGTTTTCAAG")
s2 <-
DNAString("GTTTCACTACTTCCTTTCGGGTAAGTAAATATATAAATATATAAAAATATAATTTTCATC")
## Fit a global pairwise alignment using edit distance scoring
pairwiseAlignment(s1, s2,
substitutionMatrix = nucleotideSubstitutionMatrix(0, -1, TRUE),
gapOpening = 0, gapExtension = 1)
## Examine quality-based match and mismatch bit scores for DNA/RNA
## strings in pairwiseAlignment.
## By default patternQuality and subjectQuality are PhredQuality(22L).
qualityMatrices <- qualitySubstitutionMatrices()
qualityMatrices["22", "22", "1"]
qualityMatrices["22", "22", "0"]
pairwiseAlignment(s1, s2)
## Get the substitution scores when the error probability is 0.1
subscores <- errorSubstitutionMatrices(errorProbability = 0.1)
submat <- matrix(subscores[,,"0"], 4, 4)
diag(submat) <- subscores[,,"1"]
dimnames(submat) <- list(DNA_ALPHABET[1:4], DNA_ALPHABET[1:4])
submat
pairwiseAlignment(s1, s2, substitutionMatrix = submat)
## Align two amino acid sequences with the BLOSUM62 matrix
aa1 <- AAString("HXBLVYMGCHFDCXVBEHIKQZ")
aa2 <- AAString("QRNYMYCFQCISGNEYKQN")
pairwiseAlignment(aa1, aa2, substitutionMatrix = "BLOSUM62", gapOpening = 3, gapExtension = 1)
## See how the gap penalty influences the alignment
pairwiseAlignment(aa1, aa2, substitutionMatrix = "BLOSUM62", gapOpening = 6, gapExtension = 2)
## See how the substitution matrix influences the alignment
pairwiseAlignment(aa1, aa2, substitutionMatrix = "BLOSUM50", gapOpening = 3, gapExtension = 1)
if (interactive()) {
## Compare our BLOSUM62 with BLOSUM62 from ftp://ftp.ncbi.nih.gov/blast/matrices/
data(BLOSUM62)
BLOSUM62["Q", "Z"]
file <- "ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62"
b62 <- as.matrix(read.table(file, check.names=FALSE))
b62["Q", "Z"]
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.