Description Objects from the Class Arguments Slots Methods Author(s) References See Also Examples
This is a simple implementation of the MEME motif finding algorithm. The
MEME algorithm uses an Expectation-Maximization search for motifs on
sequences (Sequences
). It requires a choice of the
motif length. For multiple motifs see MotifModelSet
Objects can be created by calls of the form motifModel(seqs,
type="fixed", width=4)
seqs
A Sequences
object which
contains the sequences to be fit
type
Can be "fixed"
, "variable"
, or
"optional"
. "fixed"
means there is a single motif at
the same position in all sequences. "variable"
means there
is a single motif at a different position in each
sequence. "optional"
means there is a single motif or no
motif at each sequence in a variable position.
width
The number of residues in the motif, the width of the motif.
mmodel
:Object of class "matrix"
The motif,
represented as a matrix. Each element in the matrix is the likelihood
of that residue being in the position. The rows are the residue type
and the columns are the position in the motif.
bmodel
:Object of class "numeric"
The
background model is the model which generates the non-motif
residues. The data is the probability of each residue type.
width
:Object of class "integer"
The width is
the number of residues in the motif
seqs
:Object of class "Sequences"
The sequence
to which the motifs are being fit.
np
:Object of class "integer"
The number of
parameters in the model, used for calculating the BIC
signature(x,...)
: Plots varying descriptive graphs
of the motif model. The first thing plotted is the likelihood of the
motif starting in each possible position. This is marginal over all
sequences. Next, the likelihood of each residue is plotted in a series
of plots representing each position in the motif. Last, the likelihood
of each sequence from the motif model is plotted.
predict(object,newSequences=object@seqs)
:
Calculates the likelihood for each sequence being generated from the
motif model. This is useful for checking for outliers or ensuring the
motif model fits the data. Passing new sequences in allows
classification of new sequences.
print(object)
: Prints information about the
motif model, including the motif string. The motif string is
simply all residues with greater than 15% probability in
decreasing order with a max of 4 (since 15*5 > 100
). So,
[R][GF][GFRD] indicates that only R has a greater than 15%
probability in position 1. In position 2, only G and F do, with G
having a higher probability than F. In position 3, G, F, R and D
all have over 15% probability of occuring.
residuals(model, seqs=model@seqs,
classes=rep(0,nrow(seqs)), threshold = 0)
: This method classifies
seqs
and calculates the number of false positives, false
negatives, and a measure of accuracy between 0 and
1. classes
should be the true classes for each sequence, where
a negative number means it is not in the model and a positive number
or 0 means it is in the model. The default is all in the model. The
log-likelihood for each sequence according to the model is calculated
and if it is greater than the threshold
, the sequence is classified as
in the model. These predictions are tested against the true
classes
passed to the method. The default value for
classes
is all sequences belong in the model. The
log-likelihood for each sequence according to the model is calculated
and if it is greater than the threshold
, the sequence is classified as
in the model. These predictions are tested against the true
classes
passed to the method.
BIC(object)
: Calculates the Bayesian Information Content of the motif
model. See BIC
for more information.
logLik(model)
: This method calculates the
(expected) log-likelihood of the motif model.
motifString(model)
: The motif string is
simply all residues with greater than 15% probability in
decreasing order with a max of 4 (since 15*5 > 100
). So,
[R][GF][GFRD] indicates that only R has a greater than 15%
probability in position 1. In position 2, only G and F do, with G
having a higher probability than F. In position 3, G, F, R and D
all have over 15% probability of occuring.
plotFits(model)
: A barplot of the
log of the ratio of the likelihood of each sequence in the model
relative to the background distribution. A positive value
indicates a good fit to the motif and a negative value means
the sequence fits the background (randomly occurring sequence)
better.
plotPositions(model)
: A grid of plots
showing the motif probability distribution at each motif position.
The three most likely amino acids/residues at each position are
annotated on the plots.
plotStartingPosition(model)
: A
barplot showing the likelihood of the motif starting at each
position. So that 0.3 in the first position means there is a 30% likelihood
of the motif starting in the first position
Andrew White
Bailey, T. L. Ph.D. thesis, University of California at San Diego, 1995.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | #get sequence data
data(TULASequences)
#Create a fixed and optional type model
TULAFixed <- motifModel(TULASequences)
print(TULAFixed)
plotPositions(TULAFixed)
TULAOpt <- motifModel(TULASequences, type="optional")
print(TULAOpt)
plotPositions(TULAOpt)
#Something more interesting, cluster the data, fit two motif models, and
# then calculate the residuals
clusters <- aclust(dist(TULASequences), 2)
TULA.M1 <- motifModel(TULASequences[clusters[[1]],], type="fixed")
TULA.M2 <- motifModel(TULASequences[clusters[[2]],], type="fixed")
#Goodness of fit for model 1 and then using model 1 on the sequences in
# model 2, which is obviously a bad fit
#get threshold by including all sequences, no matter how bad the fit is.
threshold <- min(predict(TULA.M1))
print(threshold)
print(residuals(TULA.M1), threshold=threshold)
print(residuals(TULA.M1, seqs=TULA.M2@seqs, threshold=threshold))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.