MotifModel-class: Class "MotifModel"

Description Objects from the Class Arguments Slots Methods Author(s) References See Also Examples

Description

This is a simple implementation of the MEME motif finding algorithm. The MEME algorithm uses an Expectation-Maximization search for motifs on sequences (Sequences). It requires a choice of the motif length. For multiple motifs see MotifModelSet

Objects from the Class

Objects can be created by calls of the form motifModel(seqs, type="fixed", width=4)

Arguments

seqs

A Sequences object which contains the sequences to be fit

type

Can be "fixed", "variable", or "optional". "fixed" means there is a single motif at the same position in all sequences. "variable" means there is a single motif at a different position in each sequence. "optional" means there is a single motif or no motif at each sequence in a variable position.

width

The number of residues in the motif, the width of the motif.

Slots

mmodel:

Object of class "matrix" The motif, represented as a matrix. Each element in the matrix is the likelihood of that residue being in the position. The rows are the residue type and the columns are the position in the motif.

bmodel:

Object of class "numeric" The background model is the model which generates the non-motif residues. The data is the probability of each residue type.

width:

Object of class "integer" The width is the number of residues in the motif

seqs:

Object of class "Sequences" The sequence to which the motifs are being fit.

np:

Object of class "integer" The number of parameters in the model, used for calculating the BIC

Methods

plot

signature(x,...): Plots varying descriptive graphs of the motif model. The first thing plotted is the likelihood of the motif starting in each possible position. This is marginal over all sequences. Next, the likelihood of each residue is plotted in a series of plots representing each position in the motif. Last, the likelihood of each sequence from the motif model is plotted.

predict

predict(object,newSequences=object@seqs): Calculates the likelihood for each sequence being generated from the motif model. This is useful for checking for outliers or ensuring the motif model fits the data. Passing new sequences in allows classification of new sequences.

print

print(object): Prints information about the motif model, including the motif string. The motif string is simply all residues with greater than 15% probability in decreasing order with a max of 4 (since 15*5 > 100). So, [R][GF][GFRD] indicates that only R has a greater than 15% probability in position 1. In position 2, only G and F do, with G having a higher probability than F. In position 3, G, F, R and D all have over 15% probability of occuring.

residuals

residuals(model, seqs=model@seqs, classes=rep(0,nrow(seqs)), threshold = 0): This method classifies seqs and calculates the number of false positives, false negatives, and a measure of accuracy between 0 and 1. classes should be the true classes for each sequence, where a negative number means it is not in the model and a positive number or 0 means it is in the model. The default is all in the model. The log-likelihood for each sequence according to the model is calculated and if it is greater than the threshold, the sequence is classified as in the model. These predictions are tested against the true classes passed to the method. The default value for classes is all sequences belong in the model. The log-likelihood for each sequence according to the model is calculated and if it is greater than the threshold, the sequence is classified as in the model. These predictions are tested against the true classes passed to the method.

BIC

BIC(object): Calculates the Bayesian Information Content of the motif model. See BIC for more information.

logLik

logLik(model): This method calculates the (expected) log-likelihood of the motif model.

motifString

motifString(model): The motif string is simply all residues with greater than 15% probability in decreasing order with a max of 4 (since 15*5 > 100). So, [R][GF][GFRD] indicates that only R has a greater than 15% probability in position 1. In position 2, only G and F do, with G having a higher probability than F. In position 3, G, F, R and D all have over 15% probability of occuring.

plotFits

plotFits(model): A barplot of the log of the ratio of the likelihood of each sequence in the model relative to the background distribution. A positive value indicates a good fit to the motif and a negative value means the sequence fits the background (randomly occurring sequence) better.

plotPositions

plotPositions(model): A grid of plots showing the motif probability distribution at each motif position. The three most likely amino acids/residues at each position are annotated on the plots.

plotStartingPosition

plotStartingPosition(model): A barplot showing the likelihood of the motif starting at each position. So that 0.3 in the first position means there is a 30% likelihood of the motif starting in the first position

Author(s)

Andrew White

References

Bailey, T. L. Ph.D. thesis, University of California at San Diego, 1995.

See Also

Sequences

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#get sequence data
data(TULASequences)


#Create a fixed and optional type model
TULAFixed <- motifModel(TULASequences)
print(TULAFixed)
plotPositions(TULAFixed)

TULAOpt <- motifModel(TULASequences, type="optional")
print(TULAOpt)
plotPositions(TULAOpt)


#Something more interesting, cluster the data, fit two motif models, and
# then calculate the residuals

clusters <- aclust(dist(TULASequences), 2)

TULA.M1 <- motifModel(TULASequences[clusters[[1]],], type="fixed")
TULA.M2 <- motifModel(TULASequences[clusters[[2]],], type="fixed")

#Goodness of fit for model 1 and then using model 1 on the sequences in
# model 2, which is obviously a bad fit

#get threshold by including all sequences, no matter how bad the fit is.

threshold <- min(predict(TULA.M1))
print(threshold)

print(residuals(TULA.M1), threshold=threshold)
print(residuals(TULA.M1, seqs=TULA.M2@seqs, threshold=threshold))

peplib documentation built on May 29, 2017, 10:52 p.m.