MotifModel-class: Class "MotifModel"
In peplib: Peptide Library Analysis Methods

Description Objects from the Class Arguments Slots Methods Author(s) References See Also Examples

This is a simple implementation of the MEME motif finding algorithm. The MEME algorithm uses an Expectation-Maximization search for motifs on sequences (Sequences). It requires a choice of the motif length. For multiple motifs see MotifModelSet

Objects can be created by calls of the form motifModel(seqs, type="fixed", width=4)

seqs: A Sequences object which contains the sequences to be fit
type: Can be "fixed", "variable", or "optional". "fixed" means there is a single motif at the same position in all sequences. "variable" means there is a single motif at a different position in each sequence. "optional" means there is a single motif or no motif at each sequence in a variable position.
width: The number of residues in the motif, the width of the motif.

mmodel:: Object of class "matrix" The motif, represented as a matrix. Each element in the matrix is the likelihood of that residue being in the position. The rows are the residue type and the columns are the position in the motif.
bmodel:: Object of class "numeric" The background model is the model which generates the non-motif residues. The data is the probability of each residue type.
width:: Object of class "integer" The width is the number of residues in the motif
seqs:: Object of class "Sequences" The sequence to which the motifs are being fit.
np:: Object of class "integer" The number of parameters in the model, used for calculating the BIC

plot: signature(x,...): Plots varying descriptive graphs of the motif model. The first thing plotted is the likelihood of the motif starting in each possible position. This is marginal over all sequences. Next, the likelihood of each residue is plotted in a series of plots representing each position in the motif. Last, the likelihood of each sequence from the motif model is plotted.
predict: predict(object,newSequences=object@seqs): Calculates the likelihood for each sequence being generated from the motif model. This is useful for checking for outliers or ensuring the motif model fits the data. Passing new sequences in allows classification of new sequences.
print: print(object): Prints information about the motif model, including the motif string. The motif string is simply all residues with greater than 15% probability in decreasing order with a max of 4 (since 15*5 > 100). So, [R][GF][GFRD] indicates that only R has a greater than 15% probability in position 1. In position 2, only G and F do, with G having a higher probability than F. In position 3, G, F, R and D all have over 15% probability of occuring.
residuals: residuals(model, seqs=model@seqs, classes=rep(0,nrow(seqs)), threshold = 0): This method classifies seqs and calculates the number of false positives, false negatives, and a measure of accuracy between 0 and 1. classes should be the true classes for each sequence, where a negative number means it is not in the model and a positive number or 0 means it is in the model. The default is all in the model. The log-likelihood for each sequence according to the model is calculated and if it is greater than the threshold, the sequence is classified as in the model. These predictions are tested against the true classes passed to the method. The default value for classes is all sequences belong in the model. The log-likelihood for each sequence according to the model is calculated and if it is greater than the threshold, the sequence is classified as in the model. These predictions are tested against the true classes passed to the method.
BIC: BIC(object): Calculates the Bayesian Information Content of the motif model. See BIC for more information.
logLik: logLik(model): This method calculates the (expected) log-likelihood of the motif model.
motifString: motifString(model): The motif string is simply all residues with greater than 15% probability in decreasing order with a max of 4 (since 15*5 > 100). So, [R][GF][GFRD] indicates that only R has a greater than 15% probability in position 1. In position 2, only G and F do, with G having a higher probability than F. In position 3, G, F, R and D all have over 15% probability of occuring.
plotFits: plotFits(model): A barplot of the log of the ratio of the likelihood of each sequence in the model relative to the background distribution. A positive value indicates a good fit to the motif and a negative value means the sequence fits the background (randomly occurring sequence) better.
plotPositions: plotPositions(model): A grid of plots showing the motif probability distribution at each motif position. The three most likely amino acids/residues at each position are annotated on the plots.
plotStartingPosition: plotStartingPosition(model): A barplot showing the likelihood of the motif starting at each position. So that 0.3 in the first position means there is a 30% likelihood of the motif starting in the first position

Andrew White

Bailey, T. L. Ph.D. thesis, University of California at San Diego, 1995.

Sequences

#get sequence data
data(TULASequences)


#Create a fixed and optional type model
TULAFixed <- motifModel(TULASequences)
print(TULAFixed)
plotPositions(TULAFixed)

TULAOpt <- motifModel(TULASequences, type="optional")
print(TULAOpt)
plotPositions(TULAOpt)


#Something more interesting, cluster the data, fit two motif models, and
# then calculate the residuals

clusters <- aclust(dist(TULASequences), 2)

TULA.M1 <- motifModel(TULASequences[clusters[[1]],], type="fixed")
TULA.M2 <- motifModel(TULASequences[clusters[[2]],], type="fixed")

#Goodness of fit for model 1 and then using model 1 on the sequences in
# model 2, which is obviously a bad fit

#get threshold by including all sequences, no matter how bad the fit is.

threshold <- min(predict(TULA.M1))
print(threshold)

print(residuals(TULA.M1), threshold=threshold)
print(residuals(TULA.M1, seqs=TULA.M2@seqs, threshold=threshold))