enrich_motifs: Enrich for input motifs in a set of sequences.

View source: R/enrich_motifs.R

enrich_motifsR Documentation

Enrich for input motifs in a set of sequences.


Given a set of target and background sequences, test if the input motifs are significantly enriched in the targets sequences relative to the background sequences. See the "Sequence manipulation and scanning" vignette.


enrich_motifs(motifs, sequences, bkg.sequences, max.p = 0.001,
  max.q = 0.001, max.e = 0.001, qval.method = "fdr", threshold = 1e-04,
  threshold.type = "pvalue", verbose = 0, RC = TRUE, use.freq = 1,
  shuffle.k = 2, shuffle.method = "euler", return.scan.results = FALSE,
  nthreads = 1, rng.seed = sample.int(10000, 1), motif_pvalue.k = 8,
  use.gaps = TRUE, allow.nonfinite = FALSE, warn.NA = TRUE,
  no.overlaps = TRUE, no.overlaps.by.strand = FALSE,
  no.overlaps.strat = "score", respect.strand = FALSE,
  motif_pvalue.method = c("dynamic", "exhaustive"),
  scan_sequences.qvals.method = c("BH", "fdr", "bonferroni"),
  mode = c("total.hits", "seq.hits"), pseudocount = 1)



See convert_motifs() for acceptable motif formats.


XStringSet Sequences to scan. Alphabet should match motif.


XStringSet Optional. If missing, shuffle_sequences() is used to create background sequences from the input sequences.


numeric(1) P-value threshold.


numeric(1) Adjusted P-value threshold. This is only useful if multiple motifs are being enriched for.


numeric(1). The E-value is calculated by multiplying the P-value with the number of input motifs times two (McLeay and Bailey 2010).


character(1) See stats::p.adjust().


numeric(1) See details.


character(1) One of c('pvalue', 'qvalue', 'logodds', 'logodds.abs'). See details.


numeric(1) 0 for no output, 4 for max verbosity.


logical(1) If TRUE, check reverse complement of the input sequences. Only available for DNA/RNA.


numeric(1) The default, 1, uses the motif matrix (from the motif['motif'] slot) to search for sequences. If a higher number is used, then the matching k-let matrix from the motif['multifreq'] slot is used. See add_multifreq().


numeric(1) The k-let size to use when shuffling input sequences. Only used if no background sequences are input. See shuffle_sequences().


character(1) One of c('euler', 'markov', 'linear'). See shuffle_sequences().


logical(1) Return output from scan_sequences(). For large jobs, leaving this as FALSE can save a small amount time by preventing construction of the complete results data.frame from scan_sequences().


numeric(1) Run scan_sequences() in parallel with nthreads threads. nthreads = 0 uses all available threads. Note that no speed up will occur for jobs with only a single motif and sequence.


numeric(1) Set random number generator seed. Since shuffling can occur simultaneously in multiple threads using C++, it cannot communicate with the regular R random number generator state and thus requires an independent seed. Each individual sequence in an XStringSet object will be given the following seed: rng.seed * index. See shuffle_sequences().


numeric(1) Control motif_pvalue() approximation. See motif_pvalue().


logical(1) Set this to FALSE to ignore motif gaps, if present.


logical(1) If FALSE, then apply a pseudocount if non-finite values are found in the PWM. Note that if the motif has a pseudocount greater than zero and the motif is not currently of type PWM, then this parameter has no effect as the pseudocount will be applied automatically when the motif is converted to a PWM internally. This value is set to FALSE by default in order to stay consistent with pre-version 1.8.0 behaviour. A message will be printed if a pseudocount is applied. To disable this, set options(pseudocount.warning=FALSE).


logical(1) Whether to warn about the presence of non-standard letters in the input sequence, such as those in masked sequences.


logical(1) Remove overlapping hits from the same motifs. Overlapping hits from different motifs are preserved. Please note that the current implementation of this feature can add significantly to the run time for large inputs.


logical(1) Whether to discard overlapping hits from the opposite strand (TRUE), or to only discard overlapping hits on the same strand (FALSE).


character(1) One of c("score", "order"). The former option keeps the highest scoring overlapping hit (and the first of these within ties), and the latter simply keeps the first overlapping hit.


logical(1) If motifs are DNA/RNA, then setting this option to TRUE will make scan_sequences() only scan the strands of the input sequences as indicated in the motif strand slot.


character(1) One of c("dynamic", "exhaustive"). Algorithm used for calculating P-values. The "exhaustive" method involves finding all possible motif matches at or above the specified score using a branch-and-bound algorithm, which can be computationally intensive (Hartman et al., 2013). Additionally, the computation must be repeated for each hit. The "dynamic" method calculates the distribution of possible motif scores using a much faster dynamic programming algorithm, and can be recycled for multiple scores (Grant et al., 2011). The only disadvantage is the inability to use allow.nonfinite = TRUE. See motif_pvalue() for details.


character(1) One of c("fdr", "BH", "bonferroni"). The method for calculating adjusted P-values for individual motif hits. These are described in depth in the Sequence Searches vignette.


character(1) One of c("total.hits", "seq.hits"). The former enriches for the total count of motif hits across all sequences, whereas the latter only counts motif hits once per sequence (useful for cases where there are many small sequences).


integer(1) Add a pseudocount to the motif hit counts when performing the Fisher test.


To find enriched motifs, scan_sequences() is run on both target and background sequences. stats::fisher.test() is run to test for enrichment.

See scan_sequences() for more info on scanning parameters.


DataFrame Enrichment results in a DataFrame. Function args and (optionally) scan results are stored in the metadata slot.


Benjamin Jean-Marie Tremblay benjamin.tremblay@uwaterloo.ca


McLeay R, Bailey TL (2010). “Motif Enrichment Analysis: A unified framework and method evaluation.” BMC Bioinformatics, 11.

See Also

scan_sequences(), shuffle_sequences(), add_multifreq(), motif_pvalue()


if (R.Version()$arch != "i386") {
enrich_motifs(ArabidopsisMotif, ArabidopsisPromoters, threshold = 0.01)

bjmt/universalmotif documentation built on Nov. 13, 2022, 3:09 p.m.