dotPlot | R Documentation |
Dot plots are most likely the oldest visual representation used to compare two sequences (see Maizel and Lenk 1981 and references therein). In its simplest form, a dot is produced at position (i,j) iff character number i in the first sequence is the same as character number j in the second sequence. More eleborated forms use sliding windows and a threshold value for two windows to be considered as matched.
dotPlot(seq1, seq2, wsize = 1, wstep = 1, nmatch = 1, shift = 0,
col = c("white", "black"), xlab = deparse(substitute(seq1)),
ylab = deparse(substitute(seq2)), ...)
seq1 |
the first sequence (x-axis) as a vector of single chars. |
seq2 |
the second sequence (y-axis) as a vector of single char. |
wsize |
the size in chars of the moving window. |
wstep |
the size in chars for the steps of the moving window.
Use |
nmatch |
if the number of match per window is greater than or equal
to |
shift |
the number of chars to shift in seq2 when generating the moving window. |
col |
color of points passed to |
xlab |
label of x-axis passed to |
ylab |
label of y-axis passed to |
... |
further arguments passed to |
NULL.
J.R. Lobry
Maizel, J.V. and Lenk, R.P. (1981) Enhanced Graphic Matrix Analysis of
Nucleic Acid and Protein Sequences.
Proceedings of the National Academy of Science USA,
78:7665-7669.
citation("seqinr")
image
#
# Identity is on the main diagonal:
#
dotPlot(letters, letters, main = "Direct repeat")
#
# Internal repeats are off the main diagonal:
#
dotPlot(rep(letters, 2), rep(letters, 2), main = "Internal repeats")
#
# Inversions are orthogonal to the main diagonal:
#
dotPlot(letters, rev(letters), main = "Inversion")
#
# Insertion in the second sequence yields a vertical jump:
#
dotPlot(letters, c(letters[1:10], s2c("insertion"), letters[11:26]),
main = "Insertion in the second sequence", asp = 1)
#
# Insertion in the first sequence yields an horizontal jump:
#
dotPlot(c(letters[1:10], s2c("insertion"), letters[11:26]), letters,
main = "Insertion in the first sequence", asp = 1)
#
# Protein sequences have usually a good signal/noise ratio because there
# are 20 possible amino-acids:
#
aafile <- system.file("sequences/seqAA.fasta", package = "seqinr")
protein <- read.fasta(aafile)[[1]]
dotPlot(protein, protein, main = "Dot plot of a protein\nwsize = 1, wstep = 1, nmatch = 1")
#
# Nucleic acid sequences have usually a poor signal/noise ratio because
# there are only 4 different bases:
#
dnafile <- system.file("sequences/malM.fasta", package = "seqinr")
dna <- protein <- read.fasta(dnafile)[[1]]
dotPlot(dna[1:200], dna[1:200],
main = "Dot plot of a nucleic acid sequence\nwsize = 1, wstep = 1, nmatch = 1")
#
# Play with the wsize, wstep and nmatch arguments to increase the
# signal/noise ratio:
#
dotPlot(dna[1:200], dna[1:200], wsize = 3, wstep = 3, nmatch = 3,
main = "Dot plot of a nucleic acid sequence\nwsize = 3, wstep = 3, nmatch = 3")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.