Description Objects from the Class Slots Arguments Details Methods Author(s) See Also Examples
A set of MotifModel
s
Objects can be created by calls of the form motifModelSet(seqs, motifNumber=NA, type="fixed", width=4, verbose=TRUE, clusterType="kmeans", maxGuess=10)
.
motifs
:Object of class "list"
containing a
list of MotifModel
s
An object of class Sequences
, which contains
the sequences to be fit.
The number of clusters to be found in the clustering algorithm and
thus the number of motif models to be fit. If NA
is given,
the number of clusters is inferred from the BIC. This method can
take a significant amount of time with a large number of sequences.
This is the type of motif model to fit. See
MotifModel
for more information. Can be of
"fixed"
, "variable"
, or
"optional"
. "fixed"
means the motif is in the same
position for each sequence in the cluster. This is typically the
best, since the clustering algorithm relies on the sequences already
being aligned. A "variable"
type motif model allows the
motif to be in a different position for each
sequence. "optional"
allows the motif to be unexpressed in
sequences or start at a variable position.
The number of residue positions in the motifs. Non-motif residues are fit
to a multinomial background model. If varying widths are desired,
each individual MotifModel
can be built and
then assembled into a MotifModelSet
by calling
new("MotifModelSet", motifs=mlist)
, where mlist is a list of the
motif models.
Can be "kmeans"
or "agglomerative"
. See
aclust
for more information.
If the motifNuber = NA
, then this is the bound on the maximum
number of motifs to attempt to fit.
This is a convenience class providing methods for a few common tasks that are
necessary for analyzing multiple motifs on sequences. The function that
creates these objects clusters the sequences according
to a substitution type metric, see Sequences
,
and then fits motif models to each of the clusters. The resulting
motif models can be used to discover the most likely motifs in the
sequences and to classify new sequences into the motifs. Since there
is clustering used to separate the various motifs, this approach is
somewhat ad-hoc. There is an expectation-maximization done on motif
position, but not on which motif each sequence belongs to. Due to
the ad-hoc nature of dividing sequences into motifs, the ability to
find the motif number relies on an elbow plot, which should be
viewed by the user.
This method uses the same motif model for each cluster, but this is
not required. More sophisticated modeling may be done by building a
MotifModel
for each cluster and then combining
them by calling new("MotifModelSet", motifs=mlist)
, where
mlist is a list of the motif models.
Typically, the number of motifs should be set by hand either through
using the plot
function on the sequences or examining the elbow
plots that come from this function, motifModelSet
. It is
important to note, as
mentioned in the Sequences
examples, the clustering
algorithm is sensitive to the substitution matrix used in the metric
parameters.
BIC(object)
: Calculates
the BIC of a motif model list. It sums the log-likelihood of each
motif model and uses the equation -2 * logLik + k * log(n)
,
where k
is the number of
parameters and n
is the number of independent observations,
each sequence in this case.
classify(motifSet, newSequences, threshold =
0)
: The function calculates the likelihood of each sequence
in each model using a call to the predict method of MotifModel
.
The sequences are classified into the most likely motif. If the
threshold is non-zero, then if no motif is more likely than the
threshold, the sequence is classified into the -1 class. The value
is an array with integers representing the classes. A positive number
indicates the sequence belongs to a motif. The positive number itself
corresponds to the index of the motif from the passed motifList. A negative
number indicates no class, as determined from the threshold.
residuals(model, seqs=model@seqs,
classes=rep(0,nrow(seqs)), threshold = 0)
: This method
classifies seqs
and calculates the number of false
positives, false negatives, and a measure of accuracy between 0
and 1. classes
should be the true classes for each
sequence, where a negative number means it is not in the model,
a positive number means it belongs to a specific motif, or 0
means it is simply in any motif. The default value for
classes
is all sequences belong in the model. The
log-likelihood for each sequence according to the motif is
calculated and if any is greater than the threshold
, the
sequence is classified as in the model. The sequence is also
classified into a particular motif if it is above the
threshold. These predictions are tested against the true
classes
passed to the method.
plot(model,...)
: This method provides a scatter
plot which shows the clustering of the sequences and the motif strings of each model
as the legend.
logLik(model)
: This method calculates the
(expected) log-likelihood of each of the motif models and sums
them together.
Andrew White
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | data(TULASequences)
TULAMList <- motifModelSet(TULASequences, width=6, motifNumber=4,
type="fixed")
plot(TULAMList)
plot(TULAMList@motifs[[1]])
print(TULAMList@motifs[[1]])
small.mlist <- motifModelSet(TULASequences, motifNumber=2,
type="fixed")
ll <- logLik(small.mlist)
print(ll)
large.mlist <- motifModelSet(TULASequences, motifNumber=5,
type="optional")
ll <- logLik(large.mlist)
print(ll)
#split the dataset
training.size <- nrow(TULASequences) * 2 / 3
training.indices <- sample(nrow(TULASequences), training.size)
testing.indices <- setdiff(1:nrow(TULASequences), training.indices)
training <- new("Sequences", TULASequences[training.indices,],
alphabet=TULASequences@alphabet)
testing <- new("Sequences", TULASequences[testing.indices,],
alphabet=TULASequences@alphabet)
#Now we have two sets of sequences
training.mlist <- motifModelSet(training, width=6,
motifNumber=3)
classes <- classify(training.mlist, testing)
#Now we have the classes on the unseen data
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.