# date: "`r doc_date()`" # "`r pkg_ver('BiocStyle')`" # <style> # pre { # white-space: pre !important; # overflow-y: scroll !important; # height: 50vh !important; # } # </style>
microRNAs (miRNAs) regulate expression by promoting degradation or repressing translation of target transcripts. miRNA target sites have been cataloged in databases based on experimental validation and computational prediction using a variety of algorithms. Several online resources provide collections of multiple databases but need to be imported into other software, such as R, for processing, tabulation, graphing and computation. Currently available miRNA target site packages in R are limited in the number of databases, types of databases and flexibility.
The R package multiMiR, with web server at http://multimir.org, is a comprehensive collection of predicted and validated miRNA-target interactions and their associations with diseases and drugs. multiMiR includes several novel features not available in existing R packages:
The multiMiR package enables retrieval of miRNA-target interactions from 14 external databases in R without the need to visit all these databases. Advanced users can also submit SQL queries to the web server to retrieve results. See the publication on PubMed for additional detail on the database and its creation. The database is now versioned so it is possible to use previous versions of databases from the current R package, however the package defaults to the most recent version.
Warning There are issues with merging target IDs from older unmaintained databases. Databases that have been updated more recently (1-2 years) use current versions of annotated IDs. In each update these old target IDs are carried over due to a lack of a reliable method to disambiguate the original ID with current IDs. Please keep this in mind with results from older databases that have not been updated. We continue to look at methods to resolve these ambiguities and improve target agreement between databases. You can use the unique() R function to identify and then remove multiple target genes if needed.
library(knitr) options(width=100) opts_chunk$set(echo = TRUE, message = TRUE, warning = TRUE, eval = TRUE)
The multiMiR web server
http://multimir.org hosts a database
containing miRNA-target interactions from external databases. The package
multiMiR provides functions to communicate with the multiMiR web server and
its database. The multiMiR database is now versioned. By default multiMiR
will use the most recent version each time multiMiR is loaded. However it is
now possible to switch between database versions and get information about the
multiMiR database versions. multimir_dbInfoVersions()
returns a dataframe with
the available versions.
library(multiMiR) db.ver = multimir_dbInfoVersions() db.ver
To switch between versions we can use multimir_switchDBVersion()
.
vers_table <- multimir_dbInfoVersions() vers_table multimir_switchDBVersion(db_version = "2.0.0") curr_vers <- vers_table[1, "VERSION"] # current version multimir_switchDBVersion(db_version = curr_vers)
The remaining functions will query the selected version until the package is reloaded or until we switch to another version.
Information from each external database is stored in a table in the multiMiR database.
To see a list of the tables, we can use the multimir_dbTables()
function.
db.tables = multimir_dbTables() db.tables
To display the database schema, we can use the multimir_dbSchema()
function. Following is only a portion of the full output.
```{sql, eval=FALSE}
mirna
mirna
;mirna
(target
target
;target
(mirecords
mirecords
;mirecords
(The function `multimir_dbInfo()` will display information about the external miRNA and miRNA-target databases in *multiMiR*, including version, release date, link to download the data, and the corresponding table in *multiMiR*. ```r db.info = multimir_dbInfo() db.info
Among the 14 external databases, eight contain predicted miRNA-target interactions (DIANA-microT-CDS, ElMMo, MicroCosm, miRanda, miRDB, PicTar, PITA, and TargetScan), three have experimentally validated miRNA-target interactions (miRecords, miRTarBase, and TarBase) and the remaining three contain miRNA-drug/disease associations (miR2Disease, Pharmaco-miR, and PhenomiR). To check these categories and databases from within R, we have a set of four helper functions:
predicted_tables() validated_tables() diseasedrug_tables() reverse_table_lookup("targetscan")
To see how many records are in these 14 external databases we refer to the
multimir_dbCount
function.
db.count = multimir_dbCount() db.count apply(db.count[,-1], 2, sum)
The current version of multiMiR contains nearly 50 million records.
package:multiMiR
- S3 and S4 classesWith the addition of multiMiR to Bioconductor, the return object has changed
from an S3 (mmquery
) to an S4 class (mmquery_bioc
). The new S4 object has a
similar structure to the prior version, except all returned datasets (validated,
predicted, disease.drug) are now combined into a single dataset. To get only
one type, filter on the type
variable using the AnnotationDbi method discussed
later or the base R approach subset(myqry@data, myqry@data$type == "validated")
).
For backwards compatibility, get_multimir()
will return the old S3 object if
the argument legacy.out = TRUE
.
Features are now accessible using the S4 accessor operator @
. Additionally,
the AnnotationDbi
accessor methods column
, keys
, keytypes
, and select
all work for mmquery_bioc
objects. See Section \@ref(annodbi).
In addition to functions displaying database and table information, the
multiMiR package also provides the list_multimir()
function to list all the
unique miRNAs, target genes, drugs, and diseases in the multiMiR database. An
option for limiting the number of returned records has been added to help with
testing and exploration.
miRNAs = list_multimir("mirna", limit = 10) genes = list_multimir("gene", limit = 10) drugs = list_multimir("drug", limit = 10) diseases = list_multimir("disease", limit = 10) # executes 2 separate queries, giving 20 results head(miRNAs) head(genes) head(drugs) head(diseases)
The current version of multiMiR has 5830 miRNAs and 97186 target genes from human, mouse, and rat, as well as 64 drugs and 223 disease terms. Depending on the speed of your Internet connection, it may take a few minutes to retrieve the large number of target genes.
get_multimir()
to query the multiMiR databaseget_multimir()
is the main function in the package to retrieve predicted and
validated miRNA-target interactions and their disease and drug associations from
the multiMiR database.
To get familiar with the parameters in get_multimir()
, you can type
?get_multimir
or help(get_multimir)
in R. In the next section, many examples
illustrate the use of the parameters.
This example shows the use of multiMiR
alongside the edgeR
Bioconductor
package. Here we take microRNA data from ISS and ILS mouse strains and conduct
a differential expression analysis. The top differentially expresssed
microRNA's are then used to search the multiMiR database for validated
target genes.
library(edgeR) library(multiMiR) # Load data counts_file <- system.file("extdata", "counts_table.Rds", package = "multiMiR") strains_file <- system.file("extdata", "strains_factor.Rds", package = "multiMiR") counts_table <- readRDS(counts_file) strains_factor <- readRDS(strains_file) # Standard edgeR differential expression analysis design <- model.matrix(~ strains_factor) # Using trended dispersions dge <- DGEList(counts = counts_table) dge <- calcNormFactors(dge) dge$samples$strains <- strains_factor dge <- estimateGLMCommonDisp(dge, design) dge <- estimateGLMTrendedDisp(dge, design) dge <- estimateGLMTagwiseDisp(dge, design) # Fit GLM model for strain effect fit <- glmFit(dge, design) lrt <- glmLRT(fit) # Table of unadjusted p-values (PValue) and FDR values p_val_DE_edgeR <- topTags(lrt, adjust.method = 'BH', n = Inf) # Getting top differentially expressed miRNA's top_miRNAs <- rownames(p_val_DE_edgeR$table)[1:10] # Plug miRNA's into multiMiR and getting validated targets multimir_results <- get_multimir(org = 'mmu', mirna = top_miRNAs, table = 'validated', summary = TRUE) head(multimir_results@data)
In this section a variety of examples are described on how to query the multiMiR database.
In the first example, we ask what genes are validated targets of hsa-miR-18a-3p.
# The default is to search validated interactions in human example1 <- get_multimir(mirna = 'hsa-miR-18a-3p', summary = TRUE) names(example1) # Check which types of associations were returned table(example1@data$type) # Detailed information of the validated miRNA-target interaction head(example1@data) # Which interactions are supported by Luciferase assay? example1@data[grep("Luciferase", example1@data[, "experiment"]), ] example1@summary[example1@summary[,"target_symbol"] == "KRAS",]
It turns out that KRAS is the only target validated by Luciferase assay. The
interaction was recorded in miRecords and miRTarBase and supported by the same
literature, whose PubMed ID is in column pubmed_id
. The summary (by setting
summary = TRUE
when calling get_multimir()
) shows the number of records in
each of the external databases and the total number of databases supporting the
interaction.
In this example we would like to know which miRNAs and their target genes are associated with Cisplatin, a chemotherapy drug used in several cancers.
example2 <- get_multimir(disease.drug = 'cisplatin', table = 'disease.drug') names(example2) nrow(example2@data) table(example2@data$type) head(example2@data)
get_multimir()
returns 53 miRNA-target pairs. For more information, we can
always refer to the published papers with PubMed IDs in column paper_pubmedID
.
get_multimir()
also takes target gene(s) as input. In this example we retrieve
miRNAs predicted to target Gnb1 in mouse. For predicted interactions, the
default is to query the top 20\% predictions within each external database,
which is equivalent to setting parameters predicted.cutoff = 20
and
predicted.cutoff.type = 'p'
(for percentage cutoff). Here we search the top
35% among all conserved and nonconserved target sites.
example3 <- get_multimir(org = "mmu", target = "Gnb1", table = "predicted", summary = TRUE, predicted.cutoff = 35, predicted.cutoff.type = "p", predicted.site = "all") names(example3) table(example3@data$type) head(example3@data) head(example3@summary)
The records in example3@predicted
are ordered by scores from best to
worst within each external database. Once again, the summary option allows us to
examine the number of target sites predicted by each external database and the
total number of databases predicting the interaction.
Finally we examine how many predictions each of the databases has.
apply(example3@summary[, 6:13], 2, function(x) sum(x > 0))
You may have a list of genes involved in a common biological process. It is interesting to check whether some, or all, of these genes are targeted by the same miRNA(s). Here we have four genes involved in chronic obstructive pulmonary disease (COPD) in human and want to know what miRNAs target these genes by searching the top 500,000 predictions in each external database.
example4 <- get_multimir(org = 'hsa', target = c('AKT2', 'CERS6', 'S1PR3', 'SULF2'), table = 'predicted', summary = TRUE, predicted.cutoff.type = 'n', predicted.cutoff = 500000)
Then we count the number of target genes for each miRNA.
example4.counts <- addmargins(table(example4@summary[, 2:3])) example4.counts <- example4.counts[-nrow(example4.counts), ] example4.counts <- example4.counts[order(example4.counts[, 5], decreasing = TRUE), ] head(example4.counts)
In this example, we profiled miRNA and mRNA expression in poorly metastatic bladder cancer cell lines T24 and Luc, and their metastatic derivatives FL4 and Lul2, respectively. We identified differentially expressed miRNAs and genes between the metastatic and poorly metastatic cells. Let's load the data.
load(url("http://multimir.org/bladder.rda"))
Variable DE.miRNA.up
contains 9 up-regulated miRNAs and variable
DE.entrez.dn
has 47 down-regulated genes in the two metastatic cell
lines. The hypothesis is that interactions between these miRNAs and genes whose
expression changed at opposite directions may play a role in cancer metastasis.
So we use multiMiR
to check whether any of the nine miRNAs could
target any of the 47 genes.
# search all tables & top 10% predictions example5 <- get_multimir(org = "hsa", mirna = DE.miRNA.up, target = DE.entrez.dn, table = "all", summary = TRUE, predicted.cutoff.type = "p", predicted.cutoff = 10, use.tibble = TRUE)
## Searching diana_microt ... ## Searching elmmo ... ## Searching microcosm ... ## Searching miranda ... ## Searching mirdb ... ## Searching pictar ... ## Searching pita ... ## Searching targetscan ... ## Searching mirecords ... ## Searching mirtarbase ... ## Searching tarbase ... ## Searching mir2disease ... ## Searching pharmaco_mir ... ## Searching phenomir ...
table(example5@data$type) result <- select(example5, keytype = "type", keys = "validated", columns = columns(example5)) unique_pairs <- result[!duplicated(result[, c("mature_mirna_id", "target_entrez")]), ] result
In the result, there are r nrow(unique_pairs)
unique miRNA-gene pairs that have been validated.
## database mature_mirna_acc mature_mirna_id target_symbol target_entrez ## 1 mirtarbase MIMAT0000087 hsa-miR-30a-5p FDX1 2230 ## 2 mirtarbase MIMAT0000087 hsa-miR-30a-5p LIMCH1 22998 ## 3 tarbase MIMAT0000087 hsa-miR-30a-5p FDX1 2230 ## 4 tarbase MIMAT0000424 hsa-miR-128 NEK2 4751 ## 5 tarbase MIMAT0000087 hsa-miR-30a-5p LIMCH1 22998 ## target_ensembl experiment support_type pubmed_id ## 1 ENSG00000137714 Proteomics Functional MTI (Weak) 18668040 ## 2 ENSG00000064042 pSILAC//Proteomics;Other Functional MTI (Weak) 18668040 ## 3 ENSG00000137714 Proteomics positive ## 4 ENSG00000117650 Microarray positive ## 5 ENSG00000064042 Proteomics positive
mykeytype <- "disease_drug" mykeys <- keys(example5, keytype = mykeytype) mykeys <- mykeys[grep("bladder", mykeys, ignore.case = TRUE)] result <- select(example5, keytype = "disease_drug", keys = mykeys, columns = columns(example5)) result
unique_pairs <- result[!duplicated(apply(result[, c("mature_mirna_id", "disease_drug")], 2, tolower)), ]
r nrow(unique_pairs)
miRNAs are associated with bladder cancer in miR2Disease and PhenomiR.
## database mature_mirna_acc mature_mirna_id target_symbol target_entrez ## 18 mir2disease MIMAT0000418 hsa-miR-23b-3p NA NA ## 711 phenomir MIMAT0000418 hsa-miR-23b-3p NA NA ## 311 phenomir MIMAT0000449 hsa-miR-146a-5p NA NA ## target_ensembl disease_drug ## 18 NA bladder cancer ## 711 NA Bladder cancer ## 311 NA Bladder cancer ## paper_pubmedID ## 18 2007. Micro-RNA profiling in kidney and bladder cancers. ## 711 17826655 ## 311 19127597
The predicted databases predict 65 miRNA-gene pairs between the 9 miRNAs and 28 of the 47 genes.
predicted <- select(example5, keytype = "type", keys = "predicted", columns = columns(example5)) length(unique(predicted$mature_mirna_id)) length(unique(predicted$target_entrez)) unique.pairs <- unique(data.frame(miRNA.ID = as.character(predicted$mature_mirna_id), target.Entrez = as.character(predicted$target_entrez))) nrow(unique.pairs)
head(unique.pairs)
Results from each of the predicted databases are already ordered by their scores from best to worst.
example5.split <- split(predicted, predicted$database)
AnnotationDbi accessor methods can be used to select and filter the data
returned by get_multimir()
.
# On example4's result columns(example4) head(keys(example4)) keytypes(example4) mykeys <- keys(example4)[1:4] head(select(example4, keys = mykeys, columns = c("database", "target_entrez"))) # On example3's result columns(example3) head(keys(example3)) keytypes(example3) mykeys <- keys(example3)[1:4] head(select(example3, keys = mykeys, columns = c("database", "target_entrez", "score"))) # Search by gene on example4's result columns(example4) keytypes(example4) head(keys(example4, keytype = "target_entrez")) mykeys <- keys(example4, keytype = "target_entrez")[1] head(select(example4, keys = mykeys, keytype = "target_entrez", columns = c("database", "target_entrez", "score")))
As shown previously, get_multimir is the main function to retrieve
information from the multiMiR database, which is hosted at
http://multimir.org. The function builds one SQL query for every
external database that the user is going to search, submits the query to the web
server, and parses, combines, and summarizes results from the web server. For
advanced users, there are a couple ways to query the multiMiR
database without using the multiMiR package; but they have to be
familiar with SQL queries. In general, users are still advised to use the
get_multimir()
function when querying multiple external databases in
multiMiR.
The multiMiR package communicates with the multiMiR database via the script http://multimir.org/cgi-bin/multimir_univ.pl on the web server. Once again, data from each of the external databases is stored in a table in multiMiR. There are also tables for miRNAs (table mirna) and target genes (table target).
NOTE: While it is possible to complete short queries from a browser, the limits of
submitting a query through typing in the address bar of a browser are quickly reached
(8192 characters total). If you are a developer you should use your preferred method
to submit a HTTP POST which will allow for longer queries. The fields to include are
query
and dbName
. query
is the SQL query to submit. dbName
is the
DBNAME column from a call to multimir_dbInfoVersions()
, however if this is
excluded the current version is the default.
To learn about the structure of a table (e.g. DIANA-microT data in table diana_microt), users can use URL
http://multimir.org/cgi-bin/multimir_univ.pl?query=describe diana_microt
Similar with Example 1, the following URL searches for validated target genes of hsa-miR-18a-3p in miRecords.
As you can see, the query is long and searches just one of the three validated
tables in multiMiR. While in Example 1, one line of R command using the
get_multimir()
function searches, combines and summarizes results from all
three validated external databases (miRecords, miRTarBase and TarBase).
The same direct queries we did above on the web server can be done in R as well.
This is the preferred method if you are unfamiliar with HTTP POST. Be sure to
set the correct database version, if you wish to change versions, before calling
search_multimir()
it uses the currently set version.
To show the structure of table diana_microt:
direct2 <- search_multimir(query = "describe diana_microt") direct2
To search for validated target genes of hsa-miR-18a-3p in miRecords:
qry <- "SELECT m.mature_mirna_acc, m.mature_mirna_id, t.target_symbol, t.target_entrez, t.target_ensembl, i.experiment, i.support_type, i.pubmed_id FROM mirna AS m INNER JOIN mirecords AS i INNER JOIN target AS t ON (m.mature_mirna_uid=i.mature_mirna_uid and i.target_uid=t.target_uid) WHERE m.mature_mirna_id='hsa-miR-18a-3p'" direct3 <- search_multimir(query = qry) direct3
sessionInfo() warnings()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.