runDreme: Denovo motif discovery of target regions using DREME

View source: R/generics.R

runDremeR Documentation

Denovo motif discovery of target regions using DREME

Description

DREME discovers short, ungapped, de-novo motifs that are relatively enriched relative to a control set of sequences. DREME can be run to discover motifs relative to a shuffled set of input sequences, or against a separately provided set of "control" sequences.

Usage

runDreme(input, control, outdir = "auto", meme_path = NULL, silent = TRUE, ...)

Arguments

input

regions to scan for motifs. Can be any of:

  • path to fasta file

  • DNAStringSet object (can be generated from GRanges using get_sequence())

  • List of DNAStringSet objects (generated from get_sequence())

  • NOTE: if using StringSet inputs, each entry must be named (set with names()).

  • NOTE: If you want to retain the raw dreme output files, you must use a path to fasta file as input, or specify an "outdir"

control

regions to use as background for motif search. Can be any of:

  • path to fasta file

  • DNAStringSet object (can be generated from GRanges using get_sequence)

  • A Biostrings::BStringSetList (generated using get_sequence), in which case all sequences in the list will be combined as the control set.

  • if input is a list of DNAStringSet objects, a character vector of names in input will use those sequences as background. runDreme will not scan those regions as input.

  • "shuffle" to use dreme's built-in dinucleotide shuffle feature (NOTE: if input is a list object with an entry named "shuffle", the list entry will be used instead). Optionally can also pass ⁠seed = <any number>⁠ to ... to use as the random seed during shuffling. If no seed is passed, dreme will use 1 as the random seed, so results will be reproducible if rerunning. NOTE: beware system-specific differences. As of v5, dreme will compile using the default python installation on a system (either python2.7 or python3). The random number generator changed between python2.7 and python3, so results will not be reproducible between systems using python2.7 vs 3.

outdir

path to output directory of dreme files, or "auto" to autogenerate path. Default: location of input fasta in dir named "\<input\>vs\<control\>". If input is DNAstringset, will be temporary path. This means that if you want to save the raw output files, you must use fasta files as input or use an informative (and unique) outdir name. memes will not check if it overwrites files in a directory. Directories will be recursively created if needed.

meme_path

optional, path to "meme/bin/" on your local machine. runDreme will search 3 places in order for meme if this flag is unset:

  1. the option "meme_bin" (set with options(meme_bin = "path/to/meme/bin"))

  2. the environment variable "MEME_PATH" (set in .Renviron)

  3. "~/meme/bin/" as the default location

  • If the user sets meme_path in the function call, this value will always be used

silent

whether to suppress printing dreme stdout as a message when finishing with no errors. Can be useful for troubleshooting in situations where no motifs are discovered, but command completes successfully. (default: TRUE)

...

dreme flags can be passed as R function arguments to use non-default behavior. For a full list of valid arguments, run your local install of dreme -h, or visit the dreme documentation website. See list below for aliases of common flags. To set flags with no values (ex. -dna), pass the argument as a boolean value (ex. dna = TRUE).

Details

Properly setting the control parameter is key to discovering biologically relevant motifs. Often, using control = "shuffle" will produce a suboptimal set of motifs; however, some discriminative analysis designs don't have proper "control" regions other than to shuffle.

As of MEME version 5.2.0, DREME is deprecated. Consider runStreme() instead.

In addition to allowing any valid flag of dreme to be passed to ..., we provide a few user-friendly aliases for common flags which are more readable (see list below). For example, e = 1 will use a max evalue cutoff of 1. This is equivalent to setting evalue = 1. For additional details about each DREME flag, see the DREME Manual Webpage.

List of values which can be passed to ...: NOTE: values must be referred to using their name in the "memes alias" column, not the "DREME Flag" column.

memes alias DREME Flag description default
nmotifs m max number of motifs to discover NULL
sec t max number of seconds to run NULL
evalue e max E-value cutoff 0.05
seed s random seed if using "shuffle" as control 1
ngen g nuber of REs to generalize 100
mink mink minimum motif width to search 3
maxk maxk maximum motif width to search 7
k k set mink and maxk to this value NULL
norc norc search only the input strand for sequences FALSE
dna dna use DNA alphabet TRUE
rna rna use RNA alphabet FALSE
protein protein use protein alphabet (NOT RECCOMENDED) FALSE

Value

universalmotif_df with statistics for each discovered motif. The motif column contains a universalmotif object representation in PCM format of each DREME motif. If no motifs are discovered, returns NULL. The column values are as follows:

  • rank = ranked order of discovered motif

  • name = primary name of motif

  • altname = alternative name of motif

  • seq = consensus sequence of the motif

  • length = length of discovered motif

  • nsites = number of times the motif is found in input sequences

  • positive_hits = number of sequences in input containing at least 1 of the motif

  • negative_hits = number of sequences in control containing at least 1 of the motif

  • pval = p-value

  • eval = E-value

  • unerased_eval = Unerased E-Value

  • positive_total = number of sequences in input

  • negative_total = number of sequences in control

  • pos_frac = fraction of positive sequences with a hit

  • neg_frac = fraction of negative sequences with a hit

  • motif = a universalmotif object of the discovered motif

Citation

If you use runDreme() in your analysis, please cite:

Timothy L. Bailey, "DREME: Motif discovery in transcription factor ChIP-seq data", Bioinformatics, 27(12):1653-1659, 2011. full text

Licensing

The MEME Suite is free for non-profit use, but for-profit users should purchase a license. See the MEME Suite Copyright Page for details.

Examples

if (meme_is_installed()) {
# Create random named sequences as input for example
seqs <- universalmotif::create_sequences(rng.seed = 123)
names(seqs) <- seq_along(seqs)

# Runs dreme with default settings, shuffles input as background
runDreme(seqs, "shuffle")

# Runs searching for max 2 motifs, e-value cutoff = 0.1, explicitly using the DNA alphabet
runDreme(seqs, "shuffle", nmotifs = 2, e = 0.1, dna = TRUE)
}

snystrom/dremeR documentation built on Oct. 13, 2024, 10:48 p.m.