get_bkg: Calculate sequence background.
In bjmt/universalmotif: Import, Modify, and Export Motifs with R

get_bkg

R Documentation

Calculate sequence background.

Description

For a set of input sequences, calculate the overall sequence background for any k-let size. For very large sequences DNA and RNA sequences (in the billions of bases), please be aware of the much faster and more efficient Biostrings::oligonucleotideFrequency(). get_bkg() can still be used in these cases, though it may take several seconds or minutes to calculate the results (depending on requested k-let sizes).

Usage

get_bkg(sequences, k = 1:3, as.prob = NULL, pseudocount = 0,
  alphabet = NULL, to.meme = NULL, RC = FALSE, list.out = NULL,
  nthreads = 1, merge.res = TRUE, window = FALSE, window.size = 0.1,
  window.overlap = 0)

Arguments

`sequences`	`XStringSet` Input sequences. Note that if multiple sequences are present, the results will be combined into one (unless `merge.res = FALSE`).
`k`	`integer` Size of k-let. Background can be calculated for any k-let size.
`as.prob`	Deprecated.
`pseudocount`	`integer(1)` Add a count to each possible k-let. Prevents any k-let from having 0 or 1 probabilities.
`alphabet`	`character(1)` Provide a custom alphabet to calculate a background for. If `NULL`, then standard letters will be assumed for DNA, RNA and AA sequences, and all unique letters found will be used for `BStringSet` type sequences. Note that letters which are not a part of the standard DNA/RNA/AA alphabets or in the provided alphabet will not be counted in the totals during probability calculations.
`to.meme`	If not `NULL`, then `get_bkg()` will return the sequence background in MEME Markov Background Model format. Input for this argument will be used for `cat(..., file = to.meme)` within `get_bkg()`. See http://meme-suite.org/doc/bfile-format.html for a description of the format.
`RC`	`logical(1)` Calculate the background of the reverse complement of the input sequences as well. Only valid for DNA/RNA.
`list.out`	Deprecated.
`nthreads`	`numeric(1)` Run `get_bkg()` in parallel with `nthreads` threads. `nthreads = 0` uses all available threads. Note that no speed up will occur for jobs with only a single sequence.
`merge.res`	`logical(1)` Whether to merge results from all sequences or return background data for individual sequences.
`window`	`logical(1)` Determine background in windows.
`window.size`	`numeric` Window size. If a number between 0 and 1 is provided, the value is calculated as the number multiplied by the sequence length.
`window.overlap`	`numeric` Overlap between windows. If a number between 0 and 1 is provided, the value is calculated as the number multiplied by the sequence length.

Value

If to.meme = NULL, a DataFrame with columns klet, count, and probability. If merge.res = FALSE, there will be an additional sequence column. If window = TRUE, there will be an additional start and stop columns.

If to.meme is not NULL, then NULL is returned, invisibly.

Author(s)

Benjamin Jean-Marie Tremblay, benjamin.tremblay@uwaterloo.ca

References

Bailey TL, Elkan C (1994). “Fitting a mixture model by expectation maximization to discover motifs in biopolymers.” Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, 2, 28-36.

Examples

## Compare to Biostrings version
library(Biostrings)
seqs.DNA <- create_sequences()
bkg.DNA <- get_bkg(seqs.DNA, k = 3)
bkg.DNA2 <- oligonucleotideFrequency(seqs.DNA, 3, 1, as.prob = FALSE)
bkg.DNA2 <- colSums(bkg.DNA2)
all(bkg.DNA$count == bkg.DNA2)

## Create a MEME background file
get_bkg(seqs.DNA, k = 1:3, to.meme = stdout(), pseudocount = 1)

## Non-DNA/RNA/AA alphabets
seqs.QWERTY <- create_sequences("QWERTY")
bkg.QWERTY <- get_bkg(seqs.QWERTY, k = 1:2)

bjmt/universalmotif documentation built on June 11, 2025, 2:34 a.m.