Description Objects from the Class Slots Extends Methods Warning Author(s) See Also Examples
This is a small extension of a matrix representation of the
sequences. The sequences are represented as integers, where each integer
corresponds to a character type from the alphabet. For example, if the
sequence is ADC, and the alphabet is ['A', 'B', 'C', 'D']
, the
sequence will be [1,4,3]
. The matrix itself has each sequence
as a row, and the alphabet slot contains the key that shows how the
integers correspond to the characters in the sequence.
Objects can be created by calls of the form new("Sequences", data,
nrow, ncol, byrow, dimnames, ...)
.
.Data
:Object of class "matrix"
The sequences,
where each row is a sequence and the sequence is a series of integers
alphabet
:Object of class "character"
The
character representation of each integer
nseqs
:The number of unique sequences in the class.
Class "matrix"
, from data part.
Class "array"
, by class "matrix", distance 2.
Class "structure"
, by class "matrix", distance 3.
Class "vector"
, by class "matrix", distance 4, with explicit coerce.
dist(seqs, method="substitution",
params=default.MetricParams,...)
: dist calculates the
sequence-sequences distance matrix. Use
method="substitution"
to use a substitution matrix for weighting
sequence mutations and use method="hamming"
to use equal weighting
of all mutations. Also accepts params=aMetricParams
for using a
substitution matrix other than the default. Each substitution is
given a weight of 1 using the hamming method or the score from the
corresponding substitution matrix when using the substitution
method. The distance matrix is converted to a dissimilarity distance
by making all elements negative and adding the maximum score/weight
plot(seqs, clusterNumber=4,
params=default.MetricParams,
distanceMatrix=dist.Sequences(seqs,
params=params),
clusters=aclust(dmat, clusterNumber))
:
This method plots a summary plot of the sequences. Each point
represents a sequence and the points plotted on a projection
onto their two principal components as found
from the distance matrix. Additionally, they are colored and
placed into clusters using the given cluster number and the
kmeans
algorithm found in the stats
package. This
method provides a quick way of estimating the number of clusters in
the sequences and looking for any simple patterns in the data. It
also can be used to test different substitution matrices to see
which best segregates data. For example, a BLOSUM90 substitution
matrix may work well for very similar sequences, whereas a BLOSUM50
substitution matrix will work better for very different
sequences. The distance matrix may be specified and the clusters as well.
rbind(seq1, seq2)
: This method just overwrites
the traditional rbind method by passing the alphabet along. Note
that most matrix methods editing methods do not return a Sequence
class by default, except this rbind method.
The gap character is always assumed to be the last character in the
sequence slot. Do not change this convention, since the distance
method relies on this. Not all data.frame manipulations methods have
been overridden. Thus, you may get a data.frame back instead of a
Sequences object. Only rbind
and array access has been overridden.
Andrew White
read.sequences
, which allows you to create Sequence objects from
a file, descriptors
, which creates a
Descriptors
object for a Sequences object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ##load example data and plot it
data(TULASequences)
plot(TULASequences)
## Access all sequences which have a 4 in position 1
print(TULASequences[TULASequences[,1] == 4,])
## Access all sequences which have an tyrosine residue in position 1 and
## cluster
TULASequences.subset <- TULASequences[TULASequences[,1] == which(TULASequences@alphabet == 'Y'),]
plot(TULASequences.subset)
##Calculate distance matrix on this subset and use agglomerative
## clustering to plotit
TULA.dmatrix <- dist(TULASequences.subset)
TULA.hclusters <- hclust(TULA.dmatrix)
plot(TULA.hclusters)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.