get_bkg | R Documentation |
For a set of input sequences, calculate the overall sequence background for
any k-let size. For very large sequences DNA and RNA sequences (in the billions of bases),
please be aware of the much faster and more efficient
Biostrings::oligonucleotideFrequency()
.
get_bkg()
can still be used in these cases, though it may take several seconds or
minutes to calculate the results (depending on requested k-let sizes).
get_bkg(sequences, k = 1:3, as.prob = NULL, pseudocount = 0,
alphabet = NULL, to.meme = NULL, RC = FALSE, list.out = NULL,
nthreads = 1, merge.res = TRUE, window = FALSE, window.size = 0.1,
window.overlap = 0)
sequences |
|
k |
|
as.prob |
Deprecated. |
pseudocount |
|
alphabet |
|
to.meme |
If not |
RC |
|
list.out |
Deprecated. |
nthreads |
|
merge.res |
|
window |
|
window.size |
|
window.overlap |
|
If to.meme = NULL
, a DataFrame
with columns klet
, count
,
and probability
. If merge.res = FALSE
, there will be an additional
sequence
column. If window = TRUE
, there will be an additional start
and stop
columns.
If to.meme
is not NULL
, then NULL
is returned, invisibly.
Benjamin Jean-Marie Tremblay, benjamin.tremblay@uwaterloo.ca
Bailey TL, Elkan C (1994). “Fitting a mixture model by expectation maximization to discover motifs in biopolymers.” Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, 2, 28-36.
create_sequences()
, scan_sequences()
, shuffle_sequences()
## Compare to Biostrings version
library(Biostrings)
seqs.DNA <- create_sequences()
bkg.DNA <- get_bkg(seqs.DNA, k = 3)
bkg.DNA2 <- oligonucleotideFrequency(seqs.DNA, 3, 1, as.prob = FALSE)
bkg.DNA2 <- colSums(bkg.DNA2)
all(bkg.DNA$count == bkg.DNA2)
## Create a MEME background file
get_bkg(seqs.DNA, k = 1:3, to.meme = stdout(), pseudocount = 1)
## Non-DNA/RNA/AA alphabets
seqs.QWERTY <- create_sequences("QWERTY")
bkg.QWERTY <- get_bkg(seqs.QWERTY, k = 1:2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.