enrich_motifs  R Documentation 
Enrich for input motifs in a set of sequences.
Description
Given a set of target and background sequences, test if the input motifs
are significantly enriched in the targets sequences relative to the
background sequences. See the "Sequence manipulation and scanning" vignette.
Usage
enrich_motifs(motifs, sequences, bkg.sequences, max.p = 0.001,
max.q = 0.001, max.e = 0.001, qval.method = "fdr", threshold = 1e04,
threshold.type = "pvalue", verbose = 0, RC = TRUE, use.freq = 1,
shuffle.k = 2, shuffle.method = "euler", return.scan.results = FALSE,
nthreads = 1, rng.seed = sample.int(10000, 1), motif_pvalue.k = 8,
use.gaps = TRUE, allow.nonfinite = FALSE, warn.NA = TRUE,
no.overlaps = TRUE, no.overlaps.by.strand = FALSE,
no.overlaps.strat = "score", respect.strand = FALSE,
motif_pvalue.method = c("dynamic", "exhaustive"),
scan_sequences.qvals.method = c("BH", "fdr", "bonferroni"),
mode = c("total.hits", "seq.hits"), pseudocount = 1)
Arguments
motifs 
See convert_motifs() for acceptable motif formats.

sequences 
XStringSet Sequences to scan. Alphabet
should match motif.

bkg.sequences 
XStringSet Optional. If missing,
shuffle_sequences() is used to create background sequences from
the input sequences.

max.p 
numeric(1) Pvalue threshold.

max.q 
numeric(1) Adjusted Pvalue threshold. This is only useful
if multiple motifs are being enriched for.

max.e 
numeric(1) . The Evalue is calculated by multiplying the
Pvalue with the number of input motifs times two
(McLeay and Bailey 2010).

qval.method 
character(1) See stats::p.adjust() .

threshold 
numeric(1) See details.

threshold.type 
character(1) One of c('pvalue', 'qvalue', 'logodds', 'logodds.abs') . See details.

verbose 
numeric(1) 0 for no output, 4 for max verbosity.

RC 
logical(1) If TRUE , check reverse complement of the input
sequences. Only available for DNA/RNA.

use.freq 
numeric(1) The default, 1, uses the motif matrix (from
the motif['motif'] slot) to search for sequences. If a higher
number is used, then the matching klet matrix from the
motif['multifreq'] slot is used. See add_multifreq() .

shuffle.k 
numeric(1) The klet size to use when shuffling input
sequences. Only used if no background sequences are input. See
shuffle_sequences() .

shuffle.method 
character(1) One of c('euler', 'markov', 'linear') .
See shuffle_sequences() .

return.scan.results 
logical(1) Return output from
scan_sequences() . For large jobs, leaving this as
FALSE can save a small amount time by preventing construction of the complete
results data.frame from scan_sequences() .

nthreads 
numeric(1) Run scan_sequences() in parallel with nthreads
threads. nthreads = 0 uses all available threads.
Note that no speed up will occur for jobs with only a single motif and
sequence.

rng.seed 
numeric(1) Set random number generator seed. Since shuffling
can occur simultaneously in multiple threads using C++, it cannot communicate
with the regular R random number generator state and thus requires an
independent seed. Each individual sequence in an XStringSet object
will be given the following seed: rng.seed * index . See shuffle_sequences() .

motif_pvalue.k 
numeric(1) Control motif_pvalue() approximation.
See motif_pvalue() .

use.gaps 
logical(1) Set this to FALSE to ignore motif gaps, if
present.

allow.nonfinite 
logical(1) If FALSE , then apply a pseudocount if
nonfinite values are found in the PWM. Note that if the motif has a
pseudocount greater than zero and the motif is not currently of type PWM,
then this parameter has no effect as the pseudocount will be
applied automatically when the motif is converted to a PWM internally. This
value is set to FALSE by default in order to stay consistent with
preversion 1.8.0 behaviour. A message will be printed if a pseudocount
is applied. To disable this, set options(pseudocount.warning=FALSE) .

warn.NA 
logical(1) Whether to warn about the presence of nonstandard
letters in the input sequence, such as those in masked sequences.

no.overlaps 
logical(1) Remove overlapping hits from the same motifs.
Overlapping hits from different motifs are preserved. Please note that the
current implementation of this feature can add significantly to the run
time for large inputs.

no.overlaps.by.strand 
logical(1) Whether to discard overlapping hits
from the opposite strand (TRUE ), or to only discard overlapping hits on the
same strand (FALSE ).

no.overlaps.strat 
character(1) One of c("score", "order") .
The former option keeps the highest scoring overlapping hit (and the first
of these within ties), and the latter simply keeps the first overlapping hit.

respect.strand 
logical(1) If motifs are DNA/RNA,
then setting this option to TRUE will make scan_sequences() only
scan the strands of the input sequences as indicated in the motif
strand slot.

motif_pvalue.method 
character(1) One of c("dynamic", "exhaustive") .
Algorithm used for calculating Pvalues. The "exhaustive" method
involves finding all possible motif matches at or above the specified
score using a branchandbound algorithm, which can be computationally
intensive (Hartman et al., 2013). Additionally, the computation
must be repeated for each hit. The "dynamic" method calculates the
distribution of possible motif scores using a much faster dynamic
programming algorithm, and can be recycled for multiple
scores (Grant et al., 2011). The only
disadvantage is the inability to use allow.nonfinite = TRUE .
See motif_pvalue() for details.

scan_sequences.qvals.method 
character(1) One of
c("fdr", "BH", "bonferroni") . The method for calculating adjusted
Pvalues for individual motif hits. These are described in depth in the
Sequence Searches vignette.

mode 
character(1) One of c("total.hits", "seq.hits") . The former
enriches for the total count of motif hits across all sequences,
whereas the latter only counts motif hits once per sequence (useful for
cases where there are many small sequences).

pseudocount 
integer(1) Add a pseudocount to the motif hit counts
when performing the Fisher test.

Details
To find enriched motifs, scan_sequences()
is run on both
target and background sequences.
stats::fisher.test()
is run to test for enrichment.
See scan_sequences()
for more info on scanning parameters.
Value
DataFrame
Enrichment results in a DataFrame
. Function args and
(optionally) scan results are stored in the metadata
slot.
Author(s)
Benjamin JeanMarie Tremblay benjamin.tremblay@uwaterloo.ca
References
McLeay R, Bailey TL (2010). “Motif Enrichment Analysis: A unified
framework and method evaluation.” BMC Bioinformatics, 11.
See Also
scan_sequences()
, shuffle_sequences()
,
add_multifreq()
, motif_pvalue()
Examples
data(ArabidopsisPromoters)
data(ArabidopsisMotif)
if (R.Version()$arch != "i386") {
enrich_motifs(ArabidopsisMotif, ArabidopsisPromoters, threshold = 0.01)
}