View source: R/add_multifreq.R
add_multifreq | R Documentation |
If the original sequences are available for a particular motif, then they can be used to generate higher-order PPM matrices. See the "Motif import, export, and manipulation" vignette for more information.
add_multifreq(motif, sequences, add.k = 2:3, RC = FALSE,
threshold = 0.001, threshold.type = "pvalue", motifs.perseq = 1,
add.bkg = FALSE)
motif |
See |
sequences |
|
add.k |
|
RC |
|
threshold |
|
threshold.type |
|
motifs.perseq |
|
add.bkg |
|
See scan_sequences()
for more info on scanning parameters.
At each position in the motif, then the probability of each k-let
covering from the initial position to ncol - 1
is calculated. Only
positions within the motif are considered: this means that the
final k-let probability matrix will have ncol - 1
fewer columns.
Calculating k-let probabilities for the missing columns would be
trivial however, as you would only need the background frequencies.
Since these would not be useful for scan_sequences()
though, they are not calculated.
Currently add_multifreq()
does not try to stay faithful to the default
motif matrix when generating multifreq matrices. This means that if the
sequences used for training are completely different from the actual
motif, the multifreq matrices will be as well. However this is only really
a problem if you supply add_multifreq()
with a set of sequences of the
same length as the motif. In this case add_multifreq()
is forced to
create the multifreq matrices from these sequences. Otherwise
add_multifreq()
will scan the input sequences for the motif and use the
best matches to construct the multifreq matrices.
This 'multifreq' representation is only really useful within the
universalmotif environment. Despite this, if you wish it can be
preserved in text using write_motifs()
.
The number of rows for each k-let matrix is n^k
, with n
being the
number of letters in the alphabet being used. This means that the size
of the k-let matrix can become quite large as k increases. For example,
if one were to wish to represent a DNA motif of length 10 as a 10-let,
this would require a matrix with 1,048,576 rows (though at this point
if what you want is to search for exact sequence matches,
the motif format itself is not very useful).
A universalmotif object with filled multifreq
slot. The
bkg
slot is also expanded with corresponding higher order probabilities
if add.bkg = TRUE
.
Benjamin Jean-Marie Tremblay, benjamin.tremblay@uwaterloo.ca
scan_sequences()
, convert_motifs()
, write_motifs()
sequences <- create_sequences(seqlen = 10)
motif <- create_motif()
motif.trained <- add_multifreq(motif, sequences, add.k = 2:4)
## peek at the 2-let matrix:
motif.trained["multifreq"]$`2`
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.