knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", fig.align = "center" #out.width = "100%" )
An R interface to the MEME Suite family of tools, which provides several utilities for performing motif analysis on DNA, RNA, and protein sequences. memes works by detecting a local install of the MEME suite, running the commands, then importing the results directly into R.
memes is currently available on the Bioconductor devel
branch:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") # The following initializes usage of Bioc devel BiocManager::install(version='devel') BiocManager::install("memes")
You can install the development version of memes from GitHub with:
if (!requireNamespace("remotes", quietly=TRUE)) install.packages("remotes") remotes::install_github("snystrom/memes") # To temporarily bypass the R version 4.1 requirement, you can pull from the following branch: remotes::install_github("snystrom/memes", ref = "no-r-4")
# Get development version from dockerhub docker pull snystrom/memes_docker:devel # the -v flag is used to mount an analysis directory, # it can be excluded for demo purposes docker run -e PASSWORD=<password> -p 8787:8787 -v <path>/<to>/<project>:/mnt/<project> snystrom/memes_docker:devel
memes relies on a local install of the MEME Suite. For installation instructions for the MEME suite, see the MEME Suite Installation Guide.
memes needs to know the location of the meme/bin/
directory on your local machine.
You can tell memes the location of your MEME suite install in 4 ways. memes
will always prefer the more specific definition if it is a valid path. Here they
are ranked from most- to least-specific:
meme_path
argument of all memes functionsoptions(meme_bin = "/path/to/meme/bin/")
inside your R scriptMEME_BIN=/path/to/meme/bin/
in your .Renviron
file~/meme/bin/
If memes fails to detect your install at the specified location, it will fall back to the next option.
To verify memes can detect your MEME install, use check_meme_install()
which
uses the search herirarchy above to find a valid MEME install. It will report
whether any tools are missing, and print the path to MEME that it sees. This can
be useful for troubleshooting issues with your install.
library(memes) # Verify that memes detects your meme install # (returns all green checks if so) check_meme_install()
# You can manually input a path to meme_path # If no meme/bin is detected, will return a red X check_meme_install(meme_path = 'bad/path')
| Function Name | Use | Sequence Input | Motif Input | Output |
|:-------------:|:----------------:|:--------------:|:-----------:|:-------------------------------------------------------|
| runStreme()
| Motif Discovery (short motifs) | Yes | No | universalmotif_df
|
| runDreme()
| Motif Discovery (short motifs) | Yes | No | universalmotif_df
|
| runAme()
| Motif Enrichment | Yes | Yes | data.frame (optional: sequences
column) |
| runFimo()
| Motif Scanning | Yes | Yes | GRanges of motif positions |
| runTomTom()
| Motif Comparison | No | Yes | universalmotif_df
w/ best_match_motif
and tomtom
columns* |
| runMeme()
| Motif Discovery (long motifs) | Yes | No | universalmotif_df
|
* Note: if runTomTom()
is run using a universalmotif_df
the results will be joined with the universalmotif_df
results as extra
columns. This allows easy comparison of de-novo discovered motifs with their
matches.
Sequence Inputs can be any of:
Biostrings::XStringSet
(can be generated from GRanges using get_sequence()
helper function)Biostrings::XStringSet
objects (generated by get_sequence()
)Motif Inputs can be any of:
universalmotif
object, or list of universalmotif
objectsrunDreme()
results object (this allows the results of runDreme()
to pass directly to runTomTom()
)list()
(e.g. list("path/to/database.meme", "dreme_results" = dreme_res)
)Output Types:
runDreme()
, runStreme()
, runMeme()
and runTomTom()
return
universalmotif_df
objects which are data.frames with special columns. The
motif
column contains a universalmotif
object, with 1 entry per row. The
remaining columns describe the properties of each returned motif. The following
column names are special in that their values are used when running
update_motifs()
and to_list()
to alter the properties of the motifs stored
in the motif
column. Be careful about changing these values as these changes
will propagate to the motif
column when calling update_motifs()
or
to_list()
.
memes is built around the universalmotif package
which provides a framework for manipulating motifs in R. universalmotif_df
objects can interconvert between data.frame and universalmotif
list format
using the to_df()
and to_list()
functions, respectively. This allows use of
memes
results with all other Bioconductor motif packages, as universalmotif
objects can convert to any other motif type using convert_motifs()
.
runTomTom()
returns a special column: tomtom
which is a data.frame
of all
match data for each input motif. This can be expanded out using
tidyr::unnest(tomtom_results, "tomtom")
, and renested with nest_tomtom()
.
The best_match_
prefixed columns returned by runTomTom()
indicate values for
the motif which was the best match to the input motif.
suppressPackageStartupMessages(library(magrittr)) suppressPackageStartupMessages(library(GenomicRanges)) # Example transcription factor peaks as GRanges data("example_peaks", package = "memes") # Genome object dm.genome <- BSgenome.Dmelanogaster.UCSC.dm6::BSgenome.Dmelanogaster.UCSC.dm6
The get_sequence
function takes a GRanges
or GRangesList
as input and
returns the sequences as a BioStrings::XStringSet
, or list of XStringSet
objects, respectively. get_sequence
will name each fasta entry by the genomic
coordinates each sequence is from.
# Generate sequences from 200bp about the center of my peaks of interest sequences <- example_peaks %>% resize(200, "center") %>% get_sequence(dm.genome)
runDreme()
accepts XStringSet or a path to a fasta file as input. You can use
other sequences or shuffled input sequences as the control dataset.
# runDreme accepts all arguments that the commandline version of dreme accepts # here I set e = 50 to detect motifs in the limited example peak list # In a real analysis, e should typically be < 1 dreme_results <- runDreme(sequences, control = "shuffle", e = 50)
memes is built around the
universalmotif
package. The results are returned in universalmotif_df
format, which is an R data.frame that can seamlessly interconvert between data.frame and universalmotif
format using to_list()
to convert to universalmotif
list format, and to_df()
to convert back to data.frame format. Using to_list()
allows using memes
results with all universalmotif
functions like so:
library(universalmotif) dreme_results %>% to_list() %>% view_motifs()
Discovered motifs can be matched to known TF motifs using runTomTom()
, which can accept as input a path to a .meme formatted file, a universalmotif
list, or the results of runDreme()
.
TomTom uses a database of known motifs which can be passed to the database
parameter as a path to a .meme format file, or a universalmotif
object.
Optionally, you can set the environment variable MEME_DB
in .Renviron
to a file on disk, or
the meme_db
value in options
to a valid .meme format file and memes will
use that file as the database. memes will always prefer user input to the
function call over a global variable setting.
options(meme_db = system.file("extdata/flyFactorSurvey_cleaned.meme", package = "memes")) m <- create_motif("CMATTACN", altname = "testMotif") tomtom_results <- runTomTom(m)
tomtom_results
runTomTom()
will add its results as columns to a runDreme()
results data.frame.
full_results <- dreme_results %>% runTomTom()
AME is used to test for enrichment of known motifs in target sequences. runAme()
will use the MEME_DB
entry in .Renviron
or options(meme_db =
"path/to/database.meme")
as the motif database. Alternately, it will accept all
valid inputs similar to runTomTom()
.
# here I set the evalue_report_threshold = 30 to detect motifs in the limited example sequences # In a real analysis, evalue_report_threshold should be carefully selected ame_results <- runAme(sequences, control = "shuffle", evalue_report_threshold = 30) ame_results
view_tomtom_hits
allows comparing the input motifs to the top hits from
TomTom. Manual inspection of these matches is important, as sometimes the top
match is not always the correct assignment. Altering top_n
allows you to show
additional matches in descending order of their rank.
full_results %>% view_tomtom_hits(top_n = 1)
It can be useful to view the results from runAme()
as a heatmap.
plot_ame_heatmap()
can create complex visualizations for analysis of enrichment
between different region types (see vignettes for details). Here is a simple
example heatmap.
ame_results %>% plot_ame_heatmap()
The FIMO tool is used to identify matches to known motifs. runFimo
will return
these hits as a GRanges
object containing the genomic coordinates of the motif
match.
# Query MotifDb for a motif e93_motif <- MotifDb::query(MotifDb::MotifDb, "Eip93F") %>% universalmotif::convert_motifs() # Scan for the E93 motif within given sequences fimo_results <- runFimo(sequences, e93_motif, thresh = 1e-3) # Visualize the sequences matching the E93 motif plot_sequence_heatmap(fimo_results$matched_sequence)
memes also supports importing results generated using the MEME suite outside of R (for example, running jobs on meme-suite.org, or running on the commandline). This enables use of preexisting MEME suite results with downstream memes functions.
| MEME Tool | Function Name | File Type |
|:---------:|:-------------------:|:----------------:|
| Streme | importStremeXML()
| streme.xml |
| Dreme | importDremeXML()
| dreme.xml |
| TomTom | importTomTomXML()
| tomtom.xml |
| AME | importAme()
| ame.tsv* |
| FIMO | importFimo()
| fimo.tsv |
| Meme | importMeme()
| meme.txt |
* importAME()
can also use the "sequences.tsv" output when AME used method = "fisher"
, this is optional.
The MEME Suite does not currently support Windows, although it can be installed under Cygwin or the Windows Linux Subsytem (WSL). Please note that if MEME is installed on Cygwin or WSL, you must also run R inside Cygwin or WSL to use memes.
An alternative solution is to use Docker
to run a virtual environment with the MEME Suite installed. We provide a memes docker container
that ships with the MEME Suite, R studio, and all memes
dependencies
pre-installed.
memes is a wrapper for a select few tools from the MEME Suite, which were developed by another group. In addition to citing memes, please cite the MEME Suite tools corresponding to the tools you use.
If you use runDreme()
in your analysis, please cite:
Timothy L. Bailey, "DREME: Motif discovery in transcription factor ChIP-seq data", Bioinformatics, 27(12):1653-1659, 2011. full text
If you use runTomTom()
in your analysis, please cite:
Shobhit Gupta, JA Stamatoyannopolous, Timothy Bailey and William Stafford Noble, "Quantifying similarity between motifs", Genome Biology, 8(2):R24, 2007. full text
If you use runAme()
in your analysis, please cite:
Robert McLeay and Timothy L. Bailey, "Motif Enrichment Analysis: A unified framework and method evaluation", BMC Bioinformatics, 11:165, 2010, doi:10.1186/1471-2105-11-165. full text
If you use runFimo()
in your analysis, please cite:
Charles E. Grant, Timothy L. Bailey, and William Stafford Noble, "FIMO: Scanning for occurrences of a given motif", Bioinformatics, 27(7):1017-1018, 2011. full text
The MEME Suite is free for non-profit use, but for-profit users should purchase a license. See the MEME Suite Copyright Page for details.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.