enrich_motifs: Enrich for input motifs in a set of sequences.
In bjmt/universalmotif: Import, Modify, and Export Motifs with R

enrich_motifs

R Documentation

Enrich for input motifs in a set of sequences.

Description

Given a set of target and background sequences, test if the input motifs are significantly enriched in the targets sequences relative to the background sequences. See the "Sequence manipulation and scanning" vignette.

Usage

enrich_motifs(motifs, sequences, bkg.sequences, max.p = 0.001,
  max.q = 0.001, max.e = 0.001, qval.method = "fdr", threshold = 1e-04,
  threshold.type = "pvalue", verbose = 0, RC = TRUE, use.freq = 1,
  shuffle.k = 2, shuffle.method = "euler", return.scan.results = FALSE,
  nthreads = 1, rng.seed = sample.int(10000, 1), motif_pvalue.k = 8,
  use.gaps = TRUE, allow.nonfinite = FALSE, warn.NA = TRUE,
  no.overlaps = TRUE, no.overlaps.by.strand = FALSE,
  no.overlaps.strat = "score", respect.strand = FALSE,
  motif_pvalue.method = c("dynamic", "exhaustive"),
  scan_sequences.qvals.method = c("BH", "fdr", "bonferroni"),
  mode = c("total.hits", "seq.hits"), pseudocount = 1)

Arguments

`motifs`	See `convert_motifs()` for acceptable motif formats.
`sequences`	`XStringSet` Sequences to scan. Alphabet should match motif.
`bkg.sequences`	`XStringSet` Optional. If missing, `shuffle_sequences()` is used to create background sequences from the input sequences.
`max.p`	`numeric(1)` P-value threshold.
`max.q`	`numeric(1)` Adjusted P-value threshold. This is only useful if multiple motifs are being enriched for.
`max.e`	`numeric(1)`. The E-value is calculated by multiplying the P-value with the number of input motifs times two (McLeay and Bailey 2010).
`qval.method`	`character(1)` See `stats::p.adjust()`.
`threshold`	`numeric(1)` See details.
`threshold.type`	`character(1)` One of `c('pvalue', 'qvalue', 'logodds', 'logodds.abs')`. See details.
`verbose`	`numeric(1)` 0 for no output, 4 for max verbosity.
`RC`	`logical(1)` If `TRUE`, check reverse complement of the input sequences. Only available for DNA/RNA.
`use.freq`	`numeric(1)` The default, 1, uses the motif matrix (from the `motif['motif']` slot) to search for sequences. If a higher number is used, then the matching k-let matrix from the `motif['multifreq']` slot is used. See `add_multifreq()`.
`shuffle.k`	`numeric(1)` The k-let size to use when shuffling input sequences. Only used if no background sequences are input. See `shuffle_sequences()`.
`shuffle.method`	`character(1)` One of `c('euler', 'markov', 'linear')`. See `shuffle_sequences()`.
`return.scan.results`	`logical(1)` Return output from `scan_sequences()`. For large jobs, leaving this as `FALSE` can save a small amount time by preventing construction of the complete results `data.frame` from `scan_sequences()`.
`nthreads`	`numeric(1)` Run `scan_sequences()` in parallel with `nthreads` threads. `nthreads = 0` uses all available threads. Note that no speed up will occur for jobs with only a single motif and sequence.
`rng.seed`	`numeric(1)` Set random number generator seed. Since shuffling can occur simultaneously in multiple threads using C++, it cannot communicate with the regular `R` random number generator state and thus requires an independent seed. Each individual sequence in an `XStringSet` object will be given the following seed: `rng.seed * index`. See `shuffle_sequences()`.
`motif_pvalue.k`	`numeric(1)` Control `motif_pvalue()` approximation. See `motif_pvalue()`.
`use.gaps`	`logical(1)` Set this to `FALSE` to ignore motif gaps, if present.
`allow.nonfinite`	`logical(1)` If `FALSE`, then apply a pseudocount if non-finite values are found in the PWM. Note that if the motif has a pseudocount greater than zero and the motif is not currently of type PWM, then this parameter has no effect as the pseudocount will be applied automatically when the motif is converted to a PWM internally. This value is set to `FALSE` by default in order to stay consistent with pre-version 1.8.0 behaviour. A message will be printed if a pseudocount is applied. To disable this, set `options(pseudocount.warning=FALSE)`.
`warn.NA`	`logical(1)` Whether to warn about the presence of non-standard letters in the input sequence, such as those in masked sequences.
`no.overlaps`	`logical(1)` Remove overlapping hits from the same motifs. Overlapping hits from different motifs are preserved. Please note that the current implementation of this feature can add significantly to the run time for large inputs.
`no.overlaps.by.strand`	`logical(1)` Whether to discard overlapping hits from the opposite strand (`TRUE`), or to only discard overlapping hits on the same strand (`FALSE`).
`no.overlaps.strat`	`character(1)` One of `c("score", "order")`. The former option keeps the highest scoring overlapping hit (and the first of these within ties), and the latter simply keeps the first overlapping hit.
`respect.strand`	`logical(1)` If motifs are DNA/RNA, then setting this option to `TRUE` will make `scan_sequences()` only scan the strands of the input sequences as indicated in the motif `strand` slot.
`motif_pvalue.method`	`character(1)` One of `c("dynamic", "exhaustive")`. Algorithm used for calculating P-values. The `"exhaustive"` method involves finding all possible motif matches at or above the specified score using a branch-and-bound algorithm, which can be computationally intensive (Hartman et al., 2013). Additionally, the computation must be repeated for each hit. The `"dynamic"` method calculates the distribution of possible motif scores using a much faster dynamic programming algorithm, and can be recycled for multiple scores (Grant et al., 2011). The only disadvantage is the inability to use `allow.nonfinite = TRUE`. See `motif_pvalue()` for details.
`scan_sequences.qvals.method`	`character(1)` One of `c("fdr", "BH", "bonferroni")`. The method for calculating adjusted P-values for individual motif hits. These are described in depth in the Sequence Searches vignette.
`mode`	`character(1)` One of `c("total.hits", "seq.hits")`. The former enriches for the total count of motif hits across all sequences, whereas the latter only counts motif hits once per sequence (useful for cases where there are many small sequences).
`pseudocount`	`integer(1)` Add a pseudocount to the motif hit counts when performing the Fisher test.

Details

To find enriched motifs, scan_sequences() is run on both target and background sequences. stats::fisher.test() is run to test for enrichment.

See scan_sequences() for more info on scanning parameters.

Value

DataFrame Enrichment results in a DataFrame. Function args and (optionally) scan results are stored in the metadata slot.

Author(s)

Benjamin Jean-Marie Tremblay benjamin.tremblay@uwaterloo.ca

References

McLeay R, Bailey TL (2010). “Motif Enrichment Analysis: A unified framework and method evaluation.” BMC Bioinformatics, 11.

Examples

data(ArabidopsisPromoters)
data(ArabidopsisMotif)
if (R.Version()$arch != "i386") {
enrich_motifs(ArabidopsisMotif, ArabidopsisPromoters, threshold = 0.01)
}

bjmt/universalmotif documentation built on June 11, 2025, 2:34 a.m.