knitr::opts_chunk$set(message = FALSE, warning = FALSE, fig.align = 'center')
Transcription factors and microRNAs are important for regulating the gene expression in normal physiology and pathological conditions. Many bioinformatic tools were built to predict and identify transcription factors and microRNA targets and their role in development of diseases including cancers. The availability of public access high-throughput data allowed for data-driven discoveries and validations of these predictions. Here, we build on that kind of tools and integrative analyses to provide a tool to access, manage and visualize data from open source databases. cRegulome provides a programmatic access to the regulome (microRNA and transcription factor) correlations with target genes in cancer. The package obtains a local instance of Cistrome Cancer and miRCancerdb databases and provides objects and methods to interact with and visualize the correlation data.
To get started with cRegulome, we show a very quick example. We first start by downloading a small test database file, make a simple query and convert the output to a cRegulome object to print and visualize.
# load required libraries library(cRegulome) library(RSQLite) library(ggplot2)
# download the db file when using it for the first time destfile = paste(tempdir(), 'cRegulome.db.gz', sep = '/') if(!file.exists(destfile)) { get_db(test = TRUE) } # connect to the db file db_file = paste(tempdir(), 'cRegulome.db', sep = '/') conn <- dbConnect(SQLite(), db_file)
```{bash eval=FALSE}
wget https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/9537385/cRegulome.db.gz gunzip cRegulome.db.gz
```r # locate the testset file and connect fl <- system.file('extdata', 'cRegulome.db', package = 'cRegulome') conn <- dbConnect(SQLite(), fl)
# enter a custom query with different arguments dat <- get_mir(conn, mir = 'hsa-let-7g', study = 'STES', min_abs_cor = .3, max_num = 5) # make a cmicroRNA object ob <- cmicroRNA(dat)
# print object
ob
# plot object cor_plot(ob)
The two main sources of data used by this package are Cistrome Cancer and miRCancerdb databases. Cistrome Cancer is based on an integrative analysis of The Cancer Genome Atlas (TCGA) and public ChIP-seq data. It provides calculated correlations of (n = 320) transcription factors and their target genes in (n = 29) cancer study. In addition, Cistrome Cancer provides the transcription factors regulatory potential to target and non-target genes. miRCancerdb uses TCGA data and TargetScan annotations to correlate known microRNAs (n = 750) and target and non-target genes in (n = 25) cancer studies.
cRegulome obtains a pre-build SQLite database file of the Cistrome Cancer
and miRCancerdb databases. The details of this build is provided at
cRegulomedb in addition to the
scripts used to pull, format and deposit the data at an on-line repository.
Briefly, the SQLite database consist of 4 tables cor_mir
and cor_tf
for
correlation values and targets_mir
and targets_tf
for microRNA miRBase
ID and transcription factors symbols to genes mappings. Two indices were
created to facilitate the database search using the miRBase IDs and
transcription factors symbols. The database file can be downloaded using the
function get_db
.
To show the details of the database file, the following code connects to the database and show the names of tables and fields in each of them.
# table names tabs <- dbListTables(conn) print(tabs) # fields/columns in the tables for(i in seq_along(tabs)) { print(dbListFields(conn, tabs[i])) }
To query the database using cRegulome, we provide two main functions;
get_mir
and get_tf
for querying microRNA and transcription factors
correlations respectively. Users need to provide the proper IDs for
microRNA, transcription factor symbols and/or TCGA study identifiers.
microRNAs are referred to by the official miRBase IDs, transcription
factors by their corresponding official gene symbols that contains them
and TCGA studies with their common identifiers. In either cases, the output of
calling the these functions is a tidy data frame of 4 columns; mirna_base
/
tf
, feature
, cor
and study
These correspond to the miRBase IDs or
transcription factors symbol, gene symbol, correlation value and the TCGA
study identifier.
Here we show an example of such a query. Then, we illustrate how this query
is executed on the database using basic RSQLite
and dbplyr
which is what
the get_*
functions are doing.
# query the db for two microRNAs dat_mir <- get_mir(conn, mir = c('hsa-let-7g', 'hsa-let-7i'), study = 'STES') # query the db for two transcription factors dat_tf <- get_tf(conn, tf = c('LEF1', 'MYB'), study = 'STES') # show first 6 line of each of the data.frames head(dat_mir); head(dat_tf)
Two S3 objects are provided by cRegulome to store and dispatch methods on
the correlation data. cmicroRNA and cTF for microRNA and transcription
factors respectively. The structure of these objects is very similar.
Basically, as all S3 objects, it’s a list of 4 items; microRNA or TF for
the regulome element, features for the gene hits, studies for the TCGA
studies and finally corr is either a data.frame
when the object has
data.from a single TCGA study or a named list of data.frames when it has data
from multiple studies. Each of these data.frames has the regulome element
(microRNAs or transcription factors) in columns and features/genes in rows.
To construct these objects, users need to call a constructor function with
the corresponding names on the data.frame output form get_*
. The reverse
is possible by calling the function cor_tidy
on the object to get back the
tidy data.frame.
# explore the cmicroRNA object ob_mir <- cmicroRNA(dat_mir) class(ob_mir) str(ob_mir)
# explore the cTF object ob_tf <- cTF(dat_tf) class(ob_tf) str(ob_tf)
cRegulome provides S3 methods to interact a visualize the correlations data in the cmicroRNA and cTF objects. Table 1 provides an over view of these functions. These methods dispatch directly on the objects and could be customized and manipulated in the same way as their generics.
# cmicroRNA object methods methods(class = 'cmicroRNA')
# cTF object methods methods(class = 'cTF')
# tidy method head(cor_tidy(ob_mir))
# cor_hist method cor_hist(ob_mir, breaks = 100, main = '', xlab = 'Correlation') dev.off()
# cor_joy method cor_joy(ob_mir) + labs(x = 'Correlation', y = '') dev.off()
# cor_venn_diagram method cor_venn_diagram(ob_mir, cat.default.pos = 'text') dev.off()
# cor_upset method cor_upset(ob_mir) dev.off()
Comments, issues and contributions are welcomed at: https://github.com/MahShaaban/cRegulome
Please cite:
citation('cRegulome')
dbDisconnect(conn) unlink('./Venn*')
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.