Description Usage Arguments Details Value References See Also Examples
A function used to compute and store a Markov model built by training and cross-validating on specified files. It returns a Markov model handle to conveniently reference the calculated model in other SELEX functions.
1 2 |
sample |
A sample handle to the training dataset. |
order |
The order of the Markov model to be built. If |
crossValidationSample |
A sample handle to the cross-validation dataset. If |
Kmax |
The K-mer length to determine model fit on. If |
seqfilter |
A sequence filter object to include/exclude sequences that are read in from the FASTQ file. |
mmMethod |
A character string indicating the algorithm used to evaluate the Markov model conditional probabilities. Can be either |
mmWithLeftFlank |
Predict expected counts by considering the sequences in the left flank of the variable region. See ‘Details’. |
selex.mm
builds an N-th order Markov model by training the model on the sample
dataset and tests its accuracy by attempting to predict the K-mer counts of the cross-validation dataset. This K-mer length is determined either by setting Kmax
or by an internal call to selex.kmax
. If a cross-validation dataset does not exist, selex.split
can be used to partition a dataset into testing and training datasets.
The Markov model uses conditional probabilities to predict the probability of observing a given K-mer. For example, let us consider the following 6-mer sequence:
AGGCTA
If a fourth-order Markov model were to predict the prior observation probability of the above sequence, the following would have to be evaluated:
p(AGGCTA) = p(AGG) p(C|AGG) p(T|GGC) p(A|GCT)
These conditional probabilities can be evaluated using one of two algorithms. The TRANSITION
algorithm relies on K-mer counts of length equal to the order of the Markov model. For the example above,
p(C|AGG) = count(AGGC) / [count(AGGA) + count(AGGC) + count(AGGG) + count(AGGT)]
The DIVISION
algorithm relies on K-mer frequencies for lengths equal to Markov model order and order-1. For the example above,
p(C|AGG) = P(AGGC) / P(AGG) = [count(AGGC) / (total 4-mer counts)] / [count(AGG) / (total 3-mer counts)]
The the flanking sequences in the left flank of the variable region can be taken into consideration when predicting prior observation probabilities by setting mmWithLeftFlank
to TRUE
. If the left barcode of the sequence in the example above is TGG, P(AGG) in the prior probability becomes
p(AGG) = p(A|TGG) p(G|GGA) p(G|GAG).
After computation, the Markov model is stored in the working directory. When a new seqfilter
object is provided, the Markov model is reconstructed. See selex.seqfilter
for more details. See ‘References’ for more details regarding the model construction process.
selex.mmSummary
can be used to view the R^2 values for all orders that have been computed for the Markov model. If crossValidationSample
is NULL
, the resulting Markov model will not be displayed by selex.mmSummary
.
selex.mm
returns a Markov model handle.
Slattery, M., Riley, T.R., Liu, P., Abe, N., Gomez-Alcala, P., Dror, I., Zhou, T., Rohs, R., Honig, B., Bussemaker, H.J.,and Mann, R.S. (2011) Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147:1270–1282.
Riley, T.R., Slattery, M., Abe, N., Rastogi, C., Liu, D., Mann, R.S., and Bussemaker, H.J. (2014) SELEX-seq: a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes. Methods Mol. Biol. 1196:255–278.
selex.counts
, selex.infogain
, selex.kmax
, selex.mmSummary
, selex.run
, selex.split
1 2 3 4 | mm = selex.mm(sample=r0.split$train, order=NA, crossValidationSample=r0.split$test)
# View Markov model R^2 values
selex.mmSummary()
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.