title: ggmotif author: Xiang Li date: '2022-06-30' slug: ggmotif categories: - Bioinformatics tags: - R subtitle: '' summary: 'R package ggmotif' authors: [] lastmod: '2022-06-30T22:00:06+02:00' featured: no image: caption: '' focal_point: '' preview_only: no projects: [] links: - icon: github icon_pack: fab name: GitHub url: https://github.com/lixiang117423/ggmotif


ggmotif: An R Package for the extraction and visualization of motifs from MEME software

![](https://img.shields.io/badge/release version-0.1.2-green.svg)

MEME Suit is a most used tool to identify motifs within deoxyribonucleic acid (DNA) or protein sequences. However, the results generated by the MEME Suit are saved using file formats, .xml and .txt, that are difficult to read, visualize or integrate with other wide used phylogenetic tree packages such as ggtree. To overcome this problem, we developed the ggmotif R package that provides a set of easy-to-use functions that can be used to facilitate the extraction and visualization of motifs from the results files generated by the MEME Suit. ggmotif can extract the information of the location of motif(s) on the corresponding sequence(s) from the .xml format file and visualize it. Additionally, the data extracted by ggmotif can be easily integrated with the phylogenetic data generated by ggtree. On the other hand, ggmotif can get the sequence of each motif from the .txt format file and draw the sequence logo with the function ggseqlogo from ggseqlogo R package.

Authors

Xiang LI

College of Plant Protection, Yunnan Agricultural University

https://www.web4xiang.top/

Identification of motifs

The demo data, the AP2 gene family of Arabidopsis thaliana, was downloaded from Plant Transcription Factor Database. The latest version MEME, v5.4.1, was used search motifs from the demo data using the fellow code:

meme ara.fa -protein -o meme_out -mod zoops -nmotifs 10 -minw 4 -maxw 7 -objfun classic -markov_order 0

The output files, htmlfile, txt file and xml file, can be found at GitHub.

Construction of phylogenetic tree

clustalo (V1.2.4) and FastTree (V2.1.10) were used to align the sequences and construct the phylogenetic tree.

clustalo -i ara.fa > ara.aligned.fa
FastTree ara.aligned.fa > ara.twk

The output files can be found at GitHub.

Installation and loading

if(!require(devtools)) install.packages("devtools")
devtools::install_github("lixiang117423/ggmotif")
install.packages("ggmotif")
library(ggmotif)

Parse motif information from MEME results

The results generated by MEME Suit include a lot of files, including figures of each motif and three other files, a htmlfile, a txtfile and a xml file. The html file contain some figures of motifs. The txt file is for the sequences' information and the xml for other information including position, length, p-value and so on.

The main function of ggmotif is to parse the information and plot the position of each motif on the corresponding sequences.

Parse information

information of sequences of motifs

filepath <- system.file("examples", "meme.txt", package = "ggmotif")
motif.info <- getMotifFromMEME(data = filepath, format="txt")

information of other detail information of motifs

filepath <- system.file("examples", "meme.xml", package="ggmotif")
motif.info.2 <- getMotifFromMEME(data = filepath, format="xml")

Plot location

The figures from MEME only contain the location. It is difficult to combine the location figure to the corresponding phylogenetic tree. In ggmotif, the function motifLocation can visualize the location of each motif on its corresponding sequences, almost same as the html file. If user have the corresponding phylogenetic tree, the function can combine the tree and the location.

Without tree

filepath <- system.file("examples", "meme.xml", package = "ggmotif")
motif_extract <- getMotifFromMEME(data = filepath, format="xml")
motif_plot <- motifLocation(data = motif_extract)
motif_plot +
  ggsci::scale_fill_aaas()

ggplot2::ggsave(filename = "1.png", width = 6, height = 6, dpi = 300)

With tree

filepath <- system.file("examples", "meme.xml", package = "ggmotif")
treepath <- system.file("examples", "ara.nwk", package="ggmotif")
motif_extract <- getMotifFromMEME(data = filepath, format="xml")
motif_plot <- motifLocation(data = motif_extract, tree = treepath)
motif_plot +
  ggsci::scale_fill_aaas()

ggplot2::ggsave(filename = "2.png", width = 8, height = 6, dpi = 300)

show motif(s)

library(tidyverse)

filepath <- system.file("examples", "meme.txt", package = "ggmotif")
motif.info <- getMotifFromMEME(data = filepath, format = "txt")

# show one motif
motif.info %>%
  dplyr::select(2, 4) %>%
  dplyr::filter(motif.num == "Motif.2") %>%
  dplyr::select(2) %>%
  ggseqlogo::ggseqlogo() +
  theme_bw()

filepath <- system.file("examples", "meme.txt", package = "ggmotif")
motif.info <- getMotifFromMEME(data = filepath, format = "txt")

# show all motif
plot.list <- NULL

for (i in unique(motif.info$motif.num)) {
  motif.info %>%
    dplyr::select(2, 4) %>%
    dplyr::filter(motif.num == i) %>%
    dplyr::select(2) %>%
    ggseqlogo::ggseqlogo() +
    labs(title = i) +
    theme_bw() -> plot.list[[i]]
}

cowplot::plot_grid(plotlist = plot.list, ncol = 2)

Compare with other tools

The widely used R packages memes and universalmotif can process .txt files generated by MEME, but the extracted information does not have the location information of motifs.

library(tidyverse)
library(memes)

meme.res = memes::importMeme("meme.txt",combined_sites = TRUE)

# table from memes::importMeme function
meme.res[["meme_data"]] %>% 
  dplyr::select_if(~ !any(is.na(.))) %>% 
  dplyr::select(-bkg,-motif)
meme.res[["combined_sites"]]
uni.res = universalmotif::read_meme("meme.txt")
uni.res[[1]]

And the above functions can not handle .xml file.

memes::importMeme("meme.xml")
Error in convert_motifs(motifs) : Input is an empty list
universalmotif::read_meme("meme.xml")

Session Info

sessionInfo()

Contributing

We welcome any contributions!



lixiang117423/ggmotif documentation built on Aug. 14, 2022, 5:32 a.m.