title: ggmotif author: Xiang Li date: '2022-06-30' slug: ggmotif categories: - Bioinformatics tags: - R subtitle: '' summary: 'R package ggmotif' authors: [] lastmod: '2022-06-30T22:00:06+02:00' featured: no image: caption: '' focal_point: '' preview_only: no projects: [] links: - icon: github icon_pack: fab name: GitHub url: https://github.com/lixiang117423/ggmotif

MEME Suit is a most used tool to identify motifs within deoxyribonucleic acid (DNA) or protein sequences. However, the results generated by the MEME Suit are saved using file formats, .xml and .txt, that are difficult to read, visualize or integrate with other wide used phylogenetic tree packages such as ggtree. To overcome this problem, we developed the ggmotif R package that provides a set of easy-to-use functions that can be used to facilitate the extraction and visualization of motifs from the results files generated by the MEME Suit. ggmotif can extract the information of the location of motif(s) on the corresponding sequence(s) from the .xml format file and visualize it. Additionally, the data extracted by ggmotif can be easily integrated with the phylogenetic data generated by ggtree. On the other hand, ggmotif can get the sequence of each motif from the .txt format file and draw the sequence logo with the function ggseqlogo from ggseqlogo R package.
Xiang LI
College of Plant Protection, Yunnan Agricultural University
The demo data, the AP2 gene family of Arabidopsis thaliana, was downloaded from Plant Transcription Factor Database. The latest version MEME, v5.4.1, was used search motifs from the demo data using the fellow code:
meme ara.fa -protein -o meme_out -mod zoops -nmotifs 10 -minw 4 -maxw 7 -objfun classic -markov_order 0
The output files, html
file, txt
file and xml
file, can be found at GitHub.
clustalo (V1.2.4) and FastTree (V2.1.10) were used to align the sequences and construct the phylogenetic tree.
clustalo -i ara.fa > ara.aligned.fa FastTree ara.aligned.fa > ara.twk
The output files can be found at GitHub.
if(!require(devtools)) install.packages("devtools") devtools::install_github("lixiang117423/ggmotif")
install.packages("ggmotif")
library(ggmotif)
The results generated by MEME Suit include a lot of files, including figures of each motif and three other files, a html
file, a txt
file and a xml
file. The html
file contain some figures of motifs. The txt
file is for the sequences' information and the xml
for other information including position, length, p-value and so on.
The main function of ggmotif
is to parse the information and plot the position of each motif on the corresponding sequences.
filepath <- system.file("examples", "meme.txt", package = "ggmotif") motif.info <- getMotifFromMEME(data = filepath, format="txt")
filepath <- system.file("examples", "meme.xml", package="ggmotif") motif.info.2 <- getMotifFromMEME(data = filepath, format="xml")
The figures from MEME only contain the location. It is difficult to combine the location figure to the corresponding phylogenetic tree. In ggmotif
, the function motifLocation
can visualize the location of each motif on its corresponding sequences, almost same as the html
file. If user have the corresponding phylogenetic tree, the function can combine the tree and the location.
filepath <- system.file("examples", "meme.xml", package = "ggmotif") motif_extract <- getMotifFromMEME(data = filepath, format="xml") motif_plot <- motifLocation(data = motif_extract) motif_plot + ggsci::scale_fill_aaas() ggplot2::ggsave(filename = "1.png", width = 6, height = 6, dpi = 300)
filepath <- system.file("examples", "meme.xml", package = "ggmotif") treepath <- system.file("examples", "ara.nwk", package="ggmotif") motif_extract <- getMotifFromMEME(data = filepath, format="xml") motif_plot <- motifLocation(data = motif_extract, tree = treepath) motif_plot + ggsci::scale_fill_aaas() ggplot2::ggsave(filename = "2.png", width = 8, height = 6, dpi = 300)
library(tidyverse) filepath <- system.file("examples", "meme.txt", package = "ggmotif") motif.info <- getMotifFromMEME(data = filepath, format = "txt") # show one motif motif.info %>% dplyr::select(2, 4) %>% dplyr::filter(motif.num == "Motif.2") %>% dplyr::select(2) %>% ggseqlogo::ggseqlogo() + theme_bw()
filepath <- system.file("examples", "meme.txt", package = "ggmotif") motif.info <- getMotifFromMEME(data = filepath, format = "txt") # show all motif plot.list <- NULL for (i in unique(motif.info$motif.num)) { motif.info %>% dplyr::select(2, 4) %>% dplyr::filter(motif.num == i) %>% dplyr::select(2) %>% ggseqlogo::ggseqlogo() + labs(title = i) + theme_bw() -> plot.list[[i]] } cowplot::plot_grid(plotlist = plot.list, ncol = 2)
The widely used R packages memes
and universalmotif
can process .txt
files generated by MEME, but the extracted information does not have the location information of motifs.
library(tidyverse) library(memes) meme.res = memes::importMeme("meme.txt",combined_sites = TRUE) # table from memes::importMeme function meme.res[["meme_data"]] %>% dplyr::select_if(~ !any(is.na(.))) %>% dplyr::select(-bkg,-motif)
meme.res[["combined_sites"]]
uni.res = universalmotif::read_meme("meme.txt") uni.res[[1]]
And the above functions can not handle .xml
file.
memes::importMeme("meme.xml")
Error in convert_motifs(motifs) : Input is an empty list
universalmotif::read_meme("meme.xml")
sessionInfo()
We welcome any contributions!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.