Using cRegulome

knitr::opts_chunk$set(message = FALSE, warning = FALSE, fig.align = 'center')

Overview

Transcription factors and microRNAs are important for regulating the gene expression in normal physiology and pathological conditions. Many bioinformatic tools were built to predict and identify transcription factors and microRNA targets and their role in development of diseases including cancers. The availability of public access high-throughput data allowed for data-driven discoveries and validations of these predictions. Here, we build on that kind of tools and integrative analyses to provide a tool to access, manage and visualize data from open source databases. cRegulome provides a programmatic access to the regulome (microRNA and transcription factor) correlations with target genes in cancer. The package obtains a local instance of Cistrome Cancer and miRCancerdb databases and provides objects and methods to interact with and visualize the correlation data.

Getting started

To get started with cRegulome, we show a very quick example. We first start by downloading a small test database file, make a simple query and convert the output to a cRegulome object to print and visualize.

# load required libraries
library(cRegulome)
library(RSQLite)
library(ggplot2)
# download the db file when using it for the first time
destfile = paste(tempdir(), 'cRegulome.db.gz', sep = '/')
if(!file.exists(destfile)) {
    get_db(test = TRUE)
}

# connect to the db file
db_file = paste(tempdir(), 'cRegulome.db', sep = '/')
conn <- dbConnect(SQLite(), db_file)

```{bash eval=FALSE}

alternative to downloading the database file

wget https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/9537385/cRegulome.db.gz gunzip cRegulome.db.gz

```r
# locate the testset file and connect
fl <- system.file('extdata', 'cRegulome.db', package = 'cRegulome')
conn <- dbConnect(SQLite(), fl)
# enter a custom query with different arguments
dat <- get_mir(conn,
               mir = 'hsa-let-7g',
               study = 'STES',
               min_abs_cor = .3,
               max_num = 5)

# make a cmicroRNA object   
ob <- cmicroRNA(dat)
# print object
ob
# plot object
cor_plot(ob)

Package Description

Data sources

The two main sources of data used by this package are Cistrome Cancer and miRCancerdb databases. Cistrome Cancer is based on an integrative analysis of The Cancer Genome Atlas (TCGA) and public ChIP-seq data. It provides calculated correlations of (n = 320) transcription factors and their target genes in (n = 29) cancer study. In addition, Cistrome Cancer provides the transcription factors regulatory potential to target and non-target genes. miRCancerdb uses TCGA data and TargetScan annotations to correlate known microRNAs (n = 750) and target and non-target genes in (n = 25) cancer studies.

Database file

cRegulome obtains a pre-build SQLite database file of the Cistrome Cancer and miRCancerdb databases. The details of this build is provided at cRegulomedb in addition to the scripts used to pull, format and deposit the data at an on-line repository. Briefly, the SQLite database consist of 4 tables cor_mir and cor_tf for correlation values and targets_mir and targets_tf for microRNA miRBase ID and transcription factors symbols to genes mappings. Two indices were created to facilitate the database search using the miRBase IDs and transcription factors symbols. The database file can be downloaded using the function get_db.

To show the details of the database file, the following code connects to the database and show the names of tables and fields in each of them.

# table names
tabs <- dbListTables(conn)
print(tabs)

# fields/columns in the tables
for(i in seq_along(tabs)) {
  print(dbListFields(conn, tabs[i]))
}

Database query

To query the database using cRegulome, we provide two main functions; get_mir and get_tf for querying microRNA and transcription factors correlations respectively. Users need to provide the proper IDs for microRNA, transcription factor symbols and/or TCGA study identifiers. microRNAs are referred to by the official miRBase IDs, transcription factors by their corresponding official gene symbols that contains them and TCGA studies with their common identifiers. In either cases, the output of calling the these functions is a tidy data frame of 4 columns; mirna_base/ tf, feature, cor and study These correspond to the miRBase IDs or transcription factors symbol, gene symbol, correlation value and the TCGA study identifier.

Here we show an example of such a query. Then, we illustrate how this query is executed on the database using basic RSQLite and dbplyr which is what the get_* functions are doing.

# query the db for two microRNAs
dat_mir <- get_mir(conn,
                   mir = c('hsa-let-7g', 'hsa-let-7i'),
                   study = 'STES')

# query the db for two transcription factors
dat_tf <- get_tf(conn,
                 tf = c('LEF1', 'MYB'),
                 study = 'STES')

# show first 6 line of each of the data.frames
head(dat_mir); head(dat_tf)

Objects

Two S3 objects are provided by cRegulome to store and dispatch methods on the correlation data. cmicroRNA and cTF for microRNA and transcription factors respectively. The structure of these objects is very similar. Basically, as all S3 objects, it’s a list of 4 items; microRNA or TF for the regulome element, features for the gene hits, studies for the TCGA studies and finally corr is either a data.frame when the object has data.from a single TCGA study or a named list of data.frames when it has data from multiple studies. Each of these data.frames has the regulome element (microRNAs or transcription factors) in columns and features/genes in rows.
To construct these objects, users need to call a constructor function with the corresponding names on the data.frame output form get_*. The reverse is possible by calling the function cor_tidy on the object to get back the tidy data.frame.

# explore the cmicroRNA object
ob_mir <- cmicroRNA(dat_mir)
class(ob_mir)
str(ob_mir)
# explore the cTF object
ob_tf <- cTF(dat_tf)
class(ob_tf)
str(ob_tf)

Methods

cRegulome provides S3 methods to interact a visualize the correlations data in the cmicroRNA and cTF objects. Table 1 provides an over view of these functions. These methods dispatch directly on the objects and could be customized and manipulated in the same way as their generics.

# cmicroRNA object methods
methods(class = 'cmicroRNA')
# cTF object methods
methods(class = 'cTF')
# tidy method
head(cor_tidy(ob_mir))
# cor_hist method
cor_hist(ob_mir,
     breaks = 100,
     main = '', xlab = 'Correlation')
dev.off()
# cor_joy method
cor_joy(ob_mir) +
    labs(x = 'Correlation', y = '')
dev.off()
# cor_venn_diagram method
cor_venn_diagram(ob_mir, cat.default.pos = 'text')
dev.off()
# cor_upset method
cor_upset(ob_mir)
dev.off()

Contributions

Comments, issues and contributions are welcomed at: https://github.com/MahShaaban/cRegulome

Citations

Please cite:

citation('cRegulome')
dbDisconnect(conn)
unlink('./Venn*')


Try the cRegulome package in your browser

Any scripts or data that you put into this service are public.

cRegulome documentation built on July 1, 2020, 10:26 p.m.