shuffle_sequences: Shuffle input sequences.

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/shuffle_sequences.R

Description

Given a set of input sequences, shuffle the letters within those sequences with any k-let size.

Usage

1
2
shuffle_sequences(sequences, k = 1, method = "euler", nthreads = 1,
  rng.seed = sample.int(10000, 1))

Arguments

sequences

XStringSet Set of sequences to shuffle. Works with any set of characters.

k

numeric(1) K-let size.

method

character(1) One of c('euler', 'markov', 'linear'). Only relevant if k > 1. See details.

nthreads

numeric(1) Run shuffle_sequences() in parallel with nthreads threads. nthreads = 0 uses all available threads. Note that no speed up will occur for jobs with only a single sequence.

rng.seed

numeric(1) Set random number generator seed. Since shuffling can occur simultaneously in multiple threads using C++, it cannot communicate with the regular R random number generator state and thus requires an independent seed. Each individual sequence in an XStringSet object will be given the following seed: rng.seed * index. The default is to pick a random number as chosen by sample(), which effectively is making shuffle_sequences() dependent on the R RNG state.

Details

If method = 'markov', then the Markov model is used to generate sequences which will maintain (on average) the k-let frequencies. Please note that this method is not a 'true' shuffling, and for short sequences (e.g. <100bp) this can result in slightly more dissimilar sequences versus true shuffling. See \insertCitemarkovmodel;textualuniversalmotif for a discussion on the topic.

If method = 'euler', then the sequence shuffling method proposed by \insertCitemarkovmodel2;textualuniversalmotif is used. As opposed to the 'markov' method, this one preserves exact k-let frequencies. This is done by creating a k-let edge graph, then following a random Eulerian walk through the graph. Not all walks will use up all available letters however, so the cycle-popping algorithm proposed by \insertCiteeulerAlgo;textualuniversalmotif is used to find a random Eulerian path. A side effect of using this method is that the starting and ending sequence letters will remain unshuffled.

If method = 'linear', then the input sequences are split linearly every k letters. For example, for k = 3 'ACAGATAGACCC' becomes 'ACA GAT AGA CCC'; after which these 3-lets are shuffled randomly.

Do note however, that the method parameter is only relevant for k > 1. For k = 1, a simple shuffling is performed using the shuffle function from the C++ standard library.

Value

XStringSet The input sequences will be returned with identical names and lengths.

Author(s)

Benjamin Jean-Marie Tremblay, b2tremblay@uwaterloo.ca

References

\insertRef

markovmodel2universalmotif

\insertRef

markovmodeluniversalmotif

\insertRef

eulerAlgouniversalmotif

See Also

create_sequences(), scan_sequences(), enrich_motifs(), shuffle_motifs()

Examples

1
2
3
4
if (R.Version()$arch != "i386") {
sequences <- create_sequences()
sequences.shuffled <- shuffle_sequences(sequences, k = 2)
}

bjmt/universalmotif documentation built on Sept. 19, 2020, 6:51 p.m.