selex.mm: Build or retrieve a Markov model
In SELEX: Functions for analyzing SELEX-seq data

Description Usage Arguments Details Value References See Also Examples

A function used to compute and store a Markov model built by training and cross-validating on specified files. It returns a Markov model handle to conveniently reference the calculated model in other SELEX functions.

1 2	selex.mm(sample, order=NA, crossValidationSample=NULL, Kmax= NULL, seqfilter=NULL, mmMethod="DIVISION", mmWithLeftFlank=FALSE)

`sample`	A sample handle to the training dataset.
`order`	The order of the Markov model to be built. If `NA`, builds Markov models of all orders from `0` to `Kmax-1`, selecting the one with highest R^2.
`crossValidationSample`	A sample handle to the cross-validation dataset. If `NULL`, no R^2 value will be provided.
`Kmax`	The K-mer length to determine model fit on. If `NA`, automatically finds kmax with a minimum count set to 100. See `selex.kmax` and ‘Details’.
`seqfilter`	A sequence filter object to include/exclude sequences that are read in from the FASTQ file.
`mmMethod`	A character string indicating the algorithm used to evaluate the Markov model conditional probabilities. Can be either `"DIVISION"` (default) or `"TRANSITION"`. See ‘Details’.
`mmWithLeftFlank`	Predict expected counts by considering the sequences in the left flank of the variable region. See ‘Details’.

selex.mm builds an N-th order Markov model by training the model on the sample dataset and tests its accuracy by attempting to predict the K-mer counts of the cross-validation dataset. This K-mer length is determined either by setting Kmax or by an internal call to selex.kmax. If a cross-validation dataset does not exist, selex.split can be used to partition a dataset into testing and training datasets.

The Markov model uses conditional probabilities to predict the probability of observing a given K-mer. For example, let us consider the following 6-mer sequence:

AGGCTA

If a fourth-order Markov model were to predict the prior observation probability of the above sequence, the following would have to be evaluated:

p(AGGCTA) = p(AGG) p(C|AGG) p(T|GGC) p(A|GCT)

These conditional probabilities can be evaluated using one of two algorithms. The TRANSITION algorithm relies on K-mer counts of length equal to the order of the Markov model. For the example above,

p(C|AGG) = count(AGGC) / [count(AGGA) + count(AGGC) + count(AGGG) + count(AGGT)]

The DIVISION algorithm relies on K-mer frequencies for lengths equal to Markov model order and order-1. For the example above,

p(C|AGG) = P(AGGC) / P(AGG) = [count(AGGC) / (total 4-mer counts)] / [count(AGG) / (total 3-mer counts)]

The the flanking sequences in the left flank of the variable region can be taken into consideration when predicting prior observation probabilities by setting mmWithLeftFlank to TRUE. If the left barcode of the sequence in the example above is TGG, P(AGG) in the prior probability becomes

p(AGG) = p(A|TGG) p(G|GGA) p(G|GAG).

After computation, the Markov model is stored in the working directory. When a new seqfilter object is provided, the Markov model is reconstructed. See selex.seqfilter for more details. See ‘References’ for more details regarding the model construction process.

selex.mmSummary can be used to view the R^2 values for all orders that have been computed for the Markov model. If crossValidationSample is NULL, the resulting Markov model will not be displayed by selex.mmSummary.

selex.mm returns a Markov model handle.

Slattery, M., Riley, T.R., Liu, P., Abe, N., Gomez-Alcala, P., Dror, I., Zhou, T., Rohs, R., Honig, B., Bussemaker, H.J.,and Mann, R.S. (2011) Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell 147:1270–1282.

Riley, T.R., Slattery, M., Abe, N., Rastogi, C., Liu, D., Mann, R.S., and Bussemaker, H.J. (2014) SELEX-seq: a method for characterizing the complete repertoire of binding site preferences for transcription factor complexes. Methods Mol. Biol. 1196:255–278.

selex.counts, selex.infogain, selex.kmax, selex.mmSummary, selex.run, selex.split

mm = selex.mm(sample=r0.split$train, order=NA, crossValidationSample=r0.split$test)

# View Markov model R^2 values
selex.mmSummary()

SELEX documentation built on Nov. 8, 2020, 5:22 p.m.

SELEX index

Motif Discovery with SELEX-seq

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SELEX
Functions for analyzing SELEX-seq data

selex.mm: Build or retrieve a Markov model
In SELEX: Functions for analyzing SELEX-seq data

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to selex.mm in SELEX...

R Package Documentation

Browse R Packages

We want your feedback!

SELEX Functions for analyzing SELEX-seq data

selex.mm: Build or retrieve a Markov model In SELEX: Functions for analyzing SELEX-seq data

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to selex.mm in SELEX...

R Package Documentation

Browse R Packages

We want your feedback!

SELEX
Functions for analyzing SELEX-seq data

selex.mm: Build or retrieve a Markov model
In SELEX: Functions for analyzing SELEX-seq data