View source: R/shuffle_sequences.R
shuffle_sequences | R Documentation |
Given a set of input sequences, shuffle the letters within those sequences with any k-let size.
shuffle_sequences(sequences, k = 1, method = "euler", nthreads = 1,
rng.seed = sample.int(10000, 1), window = FALSE, window.size = 0.1,
window.overlap = 0.01)
sequences |
|
k |
|
method |
|
nthreads |
|
rng.seed |
|
window |
|
window.size |
|
window.overlap |
|
If method = 'markov'
, then the Markov model is used to
generate sequences which will maintain (on average) the k-let
frequencies. Please note that this method is not a 'true' shuffling, and
for short sequences (e.g. <100bp) this can result in slightly more
dissimilar sequences versus true shuffling. See
Fitch (1983) for a discussion on the
topic.
If method = 'euler'
, then the sequence shuffling method proposed by
Altschul and Erickson (1985) is used. As opposed
to the 'markov' method, this one preserves exact k-let frequencies. This
is done by creating a k-let edge graph, then following a
random Eulerian walk through the graph. Not all walks will use up all
available letters however, so the cycle-popping algorithm proposed by
Propp and Wilson (1998) is used to find a
random Eulerian path. A side effect of using this method is that the
starting and ending sequence letters will remain unshuffled.
If method = 'linear'
, then the input sequences are split linearly
every k
letters. For example, for k = 3
'ACAGATAGACCC' becomes
'ACA GAT AGA CCC'; after which these 3
-lets are shuffled randomly.
Do note however, that the method
parameter is only relevant for k > 1
.
For k = 1
, a simple shuffling is performed using the shuffle
function
from the C++ standard library.
XStringSet
The input sequences will be returned with
identical names and lengths.
Benjamin Jean-Marie Tremblay, benjamin.tremblay@uwaterloo.ca
Altschul SF, Erickson BW (1985). “Significance of Nucleotide Sequence Alignments: A Method for Random Sequence Permutation That Preserves Dinucleotide and Codon Usage.” Molecular Biology and Evolution, 2, 526-538.
Fitch WM (1983). “Random sequences.” Journal of Molecular Biology, 163, 171-176.
Propp JG, Wilson DW (1998). “How to get a perfectly random sample from a generic markov chain and generate a random spanning tree of a directed graph.” Journal of Algorithms, 27, 170-217.
create_sequences()
, scan_sequences()
, enrich_motifs()
,
shuffle_motifs()
if (R.Version()$arch != "i386") {
sequences <- create_sequences()
sequences.shuffled <- shuffle_sequences(sequences, k = 2)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.