The multiMiR user's guide

# date: "`r doc_date()`"
# "`r pkg_ver('BiocStyle')`"
# <style>
#     pre {
#     white-space: pre !important;
#     overflow-y: scroll !important;
#     height: 50vh !important;
#     }
# </style>

Introduction

microRNAs (miRNAs) regulate expression by promoting degradation or repressing translation of target transcripts. miRNA target sites have been cataloged in databases based on experimental validation and computational prediction using a variety of algorithms. Several online resources provide collections of multiple databases but need to be imported into other software, such as R, for processing, tabulation, graphing and computation. Currently available miRNA target site packages in R are limited in the number of databases, types of databases and flexibility.

The R package multiMiR, with web server at http://multimir.org, is a comprehensive collection of predicted and validated miRNA-target interactions and their associations with diseases and drugs. multiMiR includes several novel features not available in existing R packages:

  1. Compilation of 14 different databases, more than any other collection
  2. Expansion of databases to those based on disease annotation and drug response, in addition to many experimental and computational databases
  3. User-defined cutoffs for predicted binding strength to provide the most confident selection.

The multiMiR package enables retrieval of miRNA-target interactions from 14 external databases in R without the need to visit all these databases. Advanced users can also submit SQL queries to the web server to retrieve results. See the publication on PubMed for additional detail on the database and its creation. The database is now versioned so it is possible to use previous versions of databases from the current R package, however the package defaults to the most recent version.

Warning There are issues with merging target IDs from older unmaintained databases. Databases that have been updated more recently (1-2 years) use current versions of annotated IDs. In each update these old target IDs are carried over due to a lack of a reliable method to disambiguate the original ID with current IDs. Please keep this in mind with results from older databases that have not been updated. We continue to look at methods to resolve these ambiguities and improve target agreement between databases. You can use the unique() R function to identify and then remove multiple target genes if needed.

Getting to know the multiMiR database

library(knitr)
options(width=100)
opts_chunk$set(echo       = TRUE,
               message    = TRUE,
               warning    = TRUE,
               eval       = TRUE)

The multiMiR web server http://multimir.org hosts a database containing miRNA-target interactions from external databases. The package multiMiR provides functions to communicate with the multiMiR web server and its database. The multiMiR database is now versioned. By default multiMiR will use the most recent version each time multiMiR is loaded. However it is now possible to switch between database versions and get information about the multiMiR database versions. multimir_dbInfoVersions() returns a dataframe with the available versions.

library(multiMiR)
db.ver = multimir_dbInfoVersions()
db.ver

To switch between versions we can use multimir_switchDBVersion().

vers_table <- multimir_dbInfoVersions()
vers_table

multimir_switchDBVersion(db_version = "2.0.0")

curr_vers  <- vers_table[1, "VERSION"]  # current version
multimir_switchDBVersion(db_version = curr_vers)

The remaining functions will query the selected version until the package is reloaded or until we switch to another version.

Information from each external database is stored in a table in the multiMiR database. To see a list of the tables, we can use the multimir_dbTables() function.

db.tables = multimir_dbTables()
db.tables

To display the database schema, we can use the multimir_dbSchema() function. Following is only a portion of the full output.

```{sql, eval=FALSE}

--

-- Table structure for table mirna

--

DROP TABLE IF EXISTS mirna;

CREATE TABLE mirna (

mature_mirna_uid INTEGER UNSIGNED AUTO_INCREMENT, -- mature miRNA unique ID

org VARCHAR(4) NOT NULL, -- organism abbreviation

mature_mirna_acc VARCHAR(20) default NULL, -- mature miRNA accession

mature_mirna_id VARCHAR(20) default NULL, -- mature miRNA ID/name

PRIMARY KEY (mature_mirna_uid),

KEY org (org),

KEY mature_mirna_acc (mature_mirna_acc),

KEY mature_mirna_id (mature_mirna_id)

);

--

-- Table structure for table target

--

DROP TABLE IF EXISTS target;

CREATE TABLE target (

target_uid INTEGER UNSIGNED AUTO_INCREMENT, -- target gene unique ID

org VARCHAR(4) NOT NULL, -- organism abbreviation

target_symbol VARCHAR(80) default NULL, -- target gene symbol

target_entrez VARCHAR(10) default NULL, -- target gene Entrez gene ID

target_ensembl VARCHAR(20) default NULL, -- target gene Ensembl gene ID

PRIMARY KEY (target_uid),

KEY org (org),

KEY target_symbol (target_symbol),

KEY target_entrez (target_entrez),

KEY target_ensembl (target_ensembl)

);

--

-- Table structure for table mirecords

--

DROP TABLE IF EXISTS mirecords;

CREATE TABLE mirecords (

mature_mirna_uid INTEGER UNSIGNED NOT NULL, -- mature miRNA unique ID

target_uid INTEGER UNSIGNED NOT NULL, -- target gene unique ID

target_site_number INT(10) default NULL, -- target site number

target_site_position INT(10) default NULL, -- target site position

experiment VARCHAR(160) default NULL, -- supporting experiment

support_type VARCHAR(40) default NULL, -- type of supporting experiment

pubmed_id VARCHAR(10) default NULL, -- PubMed ID

FOREIGN KEY (mature_mirna_uid)

REFERENCES mirna(mature_mirna_uid)

ON UPDATE CASCADE ON DELETE RESTRICT,

FOREIGN KEY (target_uid)

REFERENCES target(target_uid)

ON UPDATE CASCADE ON DELETE RESTRICT

);

......

(Please note that only three of the 19 tables are shown here for demonstration

purpose.)

The function `multimir_dbInfo()` will display information about the external
miRNA and miRNA-target databases in *multiMiR*, including version, release date,
link to download the data, and the corresponding table in *multiMiR*.

```r
db.info = multimir_dbInfo()
db.info

Among the 14 external databases, eight contain predicted miRNA-target interactions (DIANA-microT-CDS, ElMMo, MicroCosm, miRanda, miRDB, PicTar, PITA, and TargetScan), three have experimentally validated miRNA-target interactions (miRecords, miRTarBase, and TarBase) and the remaining three contain miRNA-drug/disease associations (miR2Disease, Pharmaco-miR, and PhenomiR). To check these categories and databases from within R, we have a set of four helper functions:

predicted_tables()
validated_tables()
diseasedrug_tables()
reverse_table_lookup("targetscan")

To see how many records are in these 14 external databases we refer to the multimir_dbCount function.

db.count = multimir_dbCount()
db.count
apply(db.count[,-1], 2, sum)

The current version of multiMiR contains nearly 50 million records.

Changes to package:multiMiR - S3 and S4 classes

With the addition of multiMiR to Bioconductor, the return object has changed from an S3 (mmquery) to an S4 class (mmquery_bioc). The new S4 object has a similar structure to the prior version, except all returned datasets (validated, predicted, disease.drug) are now combined into a single dataset. To get only one type, filter on the type variable using the AnnotationDbi method discussed later or the base R approach subset(myqry@data, myqry@data$type == "validated")). For backwards compatibility, get_multimir() will return the old S3 object if the argument legacy.out = TRUE.

Features are now accessible using the S4 accessor operator @. Additionally, the AnnotationDbi accessor methods column, keys, keytypes, and select all work for mmquery_bioc objects. See Section \@ref(annodbi).

List miRNAs, genes, drugs and diseases in the multiMiR database

In addition to functions displaying database and table information, the multiMiR package also provides the list_multimir() function to list all the unique miRNAs, target genes, drugs, and diseases in the multiMiR database. An option for limiting the number of returned records has been added to help with testing and exploration.

miRNAs   = list_multimir("mirna", limit = 10)
genes    = list_multimir("gene", limit = 10)
drugs    = list_multimir("drug", limit = 10)
diseases = list_multimir("disease", limit = 10)
# executes 2 separate queries, giving 20 results
head(miRNAs)
head(genes)
head(drugs)
head(diseases)

The current version of multiMiR has 5830 miRNAs and 97186 target genes from human, mouse, and rat, as well as 64 drugs and 223 disease terms. Depending on the speed of your Internet connection, it may take a few minutes to retrieve the large number of target genes.

Use get_multimir() to query the multiMiR database

get_multimir() is the main function in the package to retrieve predicted and validated miRNA-target interactions and their disease and drug associations from the multiMiR database.

To get familiar with the parameters in get_multimir(), you can type ?get_multimir or help(get_multimir) in R. In the next section, many examples illustrate the use of the parameters.

Example of multiMiR in a Bioconductor workflow

This example shows the use of multiMiR alongside the edgeR Bioconductor package. Here we take microRNA data from ISS and ILS mouse strains and conduct a differential expression analysis. The top differentially expresssed microRNA's are then used to search the multiMiR database for validated target genes.

library(edgeR)
library(multiMiR)

# Load data
counts_file  <- system.file("extdata", "counts_table.Rds", package = "multiMiR")
strains_file <- system.file("extdata", "strains_factor.Rds", package = "multiMiR")
counts_table   <- readRDS(counts_file)
strains_factor <- readRDS(strains_file)

# Standard edgeR differential expression analysis
design <- model.matrix(~ strains_factor)

# Using trended dispersions
dge <- DGEList(counts = counts_table)
dge <- calcNormFactors(dge)
dge$samples$strains <- strains_factor
dge <- estimateGLMCommonDisp(dge, design)
dge <- estimateGLMTrendedDisp(dge, design)
dge <- estimateGLMTagwiseDisp(dge, design)

# Fit GLM model for strain effect
fit <- glmFit(dge, design)
lrt <- glmLRT(fit)

# Table of unadjusted p-values (PValue) and FDR values
p_val_DE_edgeR <- topTags(lrt, adjust.method = 'BH', n = Inf)

# Getting top differentially expressed miRNA's
top_miRNAs <- rownames(p_val_DE_edgeR$table)[1:10]

# Plug miRNA's into multiMiR and getting validated targets
multimir_results <- get_multimir(org     = 'mmu',
                                 mirna   = top_miRNAs,
                                 table   = 'validated',
                                 summary = TRUE)
head(multimir_results@data)

Examples of multiMiR queries

In this section a variety of examples are described on how to query the multiMiR database.

Example 1: Retrieve all validated target genes of a given miRNA

In the first example, we ask what genes are validated targets of hsa-miR-18a-3p.

# The default is to search validated interactions in human
example1 <- get_multimir(mirna = 'hsa-miR-18a-3p', summary = TRUE)
names(example1)
# Check which types of associations were returned
table(example1@data$type)
# Detailed information of the validated miRNA-target interaction
head(example1@data)
# Which interactions are supported by Luciferase assay?
example1@data[grep("Luciferase", example1@data[, "experiment"]), ]
example1@summary[example1@summary[,"target_symbol"] == "KRAS",]

It turns out that KRAS is the only target validated by Luciferase assay. The interaction was recorded in miRecords and miRTarBase and supported by the same literature, whose PubMed ID is in column pubmed_id. The summary (by setting summary = TRUE when calling get_multimir()) shows the number of records in each of the external databases and the total number of databases supporting the interaction.

Example 2: Retrieve miRNA-target interactions associated with a given drug or disease

In this example we would like to know which miRNAs and their target genes are associated with Cisplatin, a chemotherapy drug used in several cancers.

example2 <- get_multimir(disease.drug = 'cisplatin', table = 'disease.drug')
names(example2)
nrow(example2@data)
table(example2@data$type)
head(example2@data)

get_multimir() returns 53 miRNA-target pairs. For more information, we can always refer to the published papers with PubMed IDs in column paper_pubmedID.

Example 3: Select miRNAs predicted to target a gene

get_multimir() also takes target gene(s) as input. In this example we retrieve miRNAs predicted to target Gnb1 in mouse. For predicted interactions, the default is to query the top 20\% predictions within each external database, which is equivalent to setting parameters predicted.cutoff = 20 and predicted.cutoff.type = 'p' (for percentage cutoff). Here we search the top 35% among all conserved and nonconserved target sites.

example3 <- get_multimir(org     = "mmu",
                         target  = "Gnb1",
                         table   = "predicted",
                         summary = TRUE,
                         predicted.cutoff      = 35,
                         predicted.cutoff.type = "p",
                         predicted.site        = "all")
names(example3)
table(example3@data$type)
head(example3@data)
head(example3@summary)

The records in example3@predicted are ordered by scores from best to worst within each external database. Once again, the summary option allows us to examine the number of target sites predicted by each external database and the total number of databases predicting the interaction.

Finally we examine how many predictions each of the databases has.

apply(example3@summary[, 6:13], 2, function(x) sum(x > 0))

Example 4: Select miRNA(s) predicted to target most, if not all, of the genes of interest

You may have a list of genes involved in a common biological process. It is interesting to check whether some, or all, of these genes are targeted by the same miRNA(s). Here we have four genes involved in chronic obstructive pulmonary disease (COPD) in human and want to know what miRNAs target these genes by searching the top 500,000 predictions in each external database.

example4 <- get_multimir(org     = 'hsa',
                         target  = c('AKT2', 'CERS6', 'S1PR3', 'SULF2'),
                         table   = 'predicted',
                         summary = TRUE,
                         predicted.cutoff.type = 'n',
                         predicted.cutoff      = 500000)

Then we count the number of target genes for each miRNA.

example4.counts <- addmargins(table(example4@summary[, 2:3]))
example4.counts <- example4.counts[-nrow(example4.counts), ]
example4.counts <- example4.counts[order(example4.counts[, 5], decreasing = TRUE), ]
head(example4.counts)

Example 5: Retrieve interactions between a set of miRNAs and a set of genes

In this example, we profiled miRNA and mRNA expression in poorly metastatic bladder cancer cell lines T24 and Luc, and their metastatic derivatives FL4 and Lul2, respectively. We identified differentially expressed miRNAs and genes between the metastatic and poorly metastatic cells. Let's load the data.

load(url("http://multimir.org/bladder.rda"))

Variable DE.miRNA.up contains 9 up-regulated miRNAs and variable DE.entrez.dn has 47 down-regulated genes in the two metastatic cell lines. The hypothesis is that interactions between these miRNAs and genes whose expression changed at opposite directions may play a role in cancer metastasis. So we use multiMiR to check whether any of the nine miRNAs could target any of the 47 genes.

# search all tables & top 10% predictions
example5 <- get_multimir(org     = "hsa",
                         mirna   = DE.miRNA.up,
                         target  = DE.entrez.dn,
                         table   = "all",
                         summary = TRUE,
                         predicted.cutoff.type = "p",
                         predicted.cutoff      = 10,
                         use.tibble = TRUE)
## Searching diana_microt ...
## Searching elmmo ...
## Searching microcosm ...
## Searching miranda ...
## Searching mirdb ...
## Searching pictar ...
## Searching pita ...
## Searching targetscan ...
## Searching mirecords ...
## Searching mirtarbase ...
## Searching tarbase ...
## Searching mir2disease ...
## Searching pharmaco_mir ...
## Searching phenomir ...
table(example5@data$type)
result <- select(example5, keytype = "type", keys = "validated", columns = columns(example5))
unique_pairs <- 
    result[!duplicated(result[, c("mature_mirna_id", "target_entrez")]), ]

result

In the result, there are r nrow(unique_pairs) unique miRNA-gene pairs that have been validated.

##     database mature_mirna_acc mature_mirna_id target_symbol target_entrez
## 1 mirtarbase     MIMAT0000087  hsa-miR-30a-5p          FDX1          2230
## 2 mirtarbase     MIMAT0000087  hsa-miR-30a-5p        LIMCH1         22998
## 3    tarbase     MIMAT0000087  hsa-miR-30a-5p          FDX1          2230
## 4    tarbase     MIMAT0000424     hsa-miR-128          NEK2          4751
## 5    tarbase     MIMAT0000087  hsa-miR-30a-5p        LIMCH1         22998
##    target_ensembl               experiment          support_type pubmed_id
## 1 ENSG00000137714               Proteomics Functional MTI (Weak)  18668040
## 2 ENSG00000064042 pSILAC//Proteomics;Other Functional MTI (Weak)  18668040
## 3 ENSG00000137714               Proteomics              positive          
## 4 ENSG00000117650               Microarray              positive          
## 5 ENSG00000064042               Proteomics              positive          
mykeytype <- "disease_drug"

mykeys <- keys(example5, keytype = mykeytype)
mykeys <- mykeys[grep("bladder", mykeys, ignore.case = TRUE)]

result <- select(example5, keytype = "disease_drug", keys = mykeys,
                 columns = columns(example5))
result
unique_pairs <- 
    result[!duplicated(apply(result[, c("mature_mirna_id", "disease_drug")], 2,
                             tolower)), ]

r nrow(unique_pairs) miRNAs are associated with bladder cancer in miR2Disease and PhenomiR.

##        database mature_mirna_acc mature_mirna_id target_symbol target_entrez
## 18  mir2disease     MIMAT0000418  hsa-miR-23b-3p            NA            NA
## 711    phenomir     MIMAT0000418  hsa-miR-23b-3p            NA            NA
## 311    phenomir     MIMAT0000449 hsa-miR-146a-5p            NA            NA
##     target_ensembl   disease_drug
## 18              NA bladder cancer
## 711             NA Bladder cancer
## 311             NA Bladder cancer
##                                               paper_pubmedID
## 18  2007. Micro-RNA profiling in kidney and bladder cancers.
## 711                                                 17826655
## 311                                                 19127597

The predicted databases predict 65 miRNA-gene pairs between the 9 miRNAs and 28 of the 47 genes.

predicted <- select(example5, keytype = "type", keys = "predicted", 
                    columns = columns(example5))
length(unique(predicted$mature_mirna_id))
length(unique(predicted$target_entrez))
unique.pairs <- 
    unique(data.frame(miRNA.ID = as.character(predicted$mature_mirna_id),
                      target.Entrez = as.character(predicted$target_entrez)))
nrow(unique.pairs)
head(unique.pairs)

Results from each of the predicted databases are already ordered by their scores from best to worst.

example5.split <- split(predicted, predicted$database)

Use of AnnotationDbi accessor methods {#annodbi}

AnnotationDbi accessor methods can be used to select and filter the data returned by get_multimir().

# On example4's result
columns(example4)
head(keys(example4))
keytypes(example4)
mykeys <- keys(example4)[1:4]
head(select(example4, keys = mykeys, 
            columns = c("database", "target_entrez")))

# On example3's result
columns(example3)
head(keys(example3))
keytypes(example3)
mykeys <- keys(example3)[1:4]
head(select(example3, keys = mykeys, 
            columns = c("database", "target_entrez", "score")))

# Search by gene on example4's result
columns(example4)
keytypes(example4)
head(keys(example4, keytype = "target_entrez"))
mykeys <- keys(example4, keytype = "target_entrez")[1]
head(select(example4, keys = mykeys, keytype = "target_entrez",
            columns = c("database", "target_entrez", "score")))

Direct query to the database on the multiMiR web server

As shown previously, get_multimir is the main function to retrieve information from the multiMiR database, which is hosted at http://multimir.org. The function builds one SQL query for every external database that the user is going to search, submits the query to the web server, and parses, combines, and summarizes results from the web server. For advanced users, there are a couple ways to query the multiMiR database without using the multiMiR package; but they have to be familiar with SQL queries. In general, users are still advised to use the get_multimir() function when querying multiple external databases in multiMiR.

Direct query on the web server

The multiMiR package communicates with the multiMiR database via the script http://multimir.org/cgi-bin/multimir_univ.pl on the web server. Once again, data from each of the external databases is stored in a table in multiMiR. There are also tables for miRNAs (table mirna) and target genes (table target).

NOTE: While it is possible to complete short queries from a browser, the limits of submitting a query through typing in the address bar of a browser are quickly reached (8192 characters total). If you are a developer you should use your preferred method to submit a HTTP POST which will allow for longer queries. The fields to include are query and dbName. query is the SQL query to submit. dbName is the DBNAME column from a call to multimir_dbInfoVersions(), however if this is excluded the current version is the default.

To learn about the structure of a table (e.g. DIANA-microT data in table diana_microt), users can use URL

http://multimir.org/cgi-bin/multimir_univ.pl?query=describe diana_microt

Similar with Example 1, the following URL searches for validated target genes of hsa-miR-18a-3p in miRecords.

http://multimir.org/cgi-bin/multimir.pl?query=SELECT m.mature_mirna_acc, m.mature_mirna_id, t.target_symbol, t.target_entrez, t.target_ensembl, i.experiment, i.support_type, i.pubmed_id FROM mirna AS m INNER JOIN mirecords AS i INNER JOIN target AS t ON (m.mature_mirna_uid=i.mature_mirna_uid and i.target_uid=t.target_uid) WHERE m.mature_mirna_id='hsa-miR-18a-3p'

As you can see, the query is long and searches just one of the three validated tables in multiMiR. While in Example 1, one line of R command using the get_multimir() function searches, combines and summarizes results from all three validated external databases (miRecords, miRTarBase and TarBase).

Direct query in R

The same direct queries we did above on the web server can be done in R as well. This is the preferred method if you are unfamiliar with HTTP POST. Be sure to set the correct database version, if you wish to change versions, before calling search_multimir() it uses the currently set version.

To show the structure of table diana_microt:

direct2 <- search_multimir(query = "describe diana_microt")
direct2

To search for validated target genes of hsa-miR-18a-3p in miRecords:

qry <- "SELECT m.mature_mirna_acc, m.mature_mirna_id, t.target_symbol,
               t.target_entrez, t.target_ensembl, i.experiment, i.support_type,
               i.pubmed_id 
        FROM mirna AS m INNER JOIN mirecords AS i INNER JOIN target AS t 
        ON (m.mature_mirna_uid=i.mature_mirna_uid and 
            i.target_uid=t.target_uid) 
        WHERE m.mature_mirna_id='hsa-miR-18a-3p'"
direct3 <- search_multimir(query = qry)
direct3

Session Info

sessionInfo()
warnings()


Try the multiMiR package in your browser

Any scripts or data that you put into this service are public.

multiMiR documentation built on Nov. 8, 2020, 5:46 p.m.