In ywhsiao/ceRNAR: ceRNAR: An R Package for Identification and Analysis of ceRNAs-miRNA Triplets

knitr::opts_chunk$set(tidy = FALSE,
                      warning = FALSE,
                      message = FALSE)
options(rmarkdown.html_vignette.check_title = FALSE)

Introduction

The gene expression regulating mechanisms in humans are complex as many regulators are highly connected and are compensatory to each other. Many non-coding RNAs have critical roles in post-transcriptional regulation of protein-coding genes, including microRNAs [1]. microRNAs are short, non-coding, single-stranded RNAs with ~22 oligonucleotide. Typically, microRNAs modulate gene expression via partial complementary with many miRNA response elements (MREs) on protein-coding genes. Recently, competitive endogenous RNA (ceRNA), including canonical protein-coding messenger RNAs (mRNAs), long non-coding RNAs (lncRNAs), circular RNAs (circRNAs), and pseudogenes, represents a novel mechanism of gene regulation that controls several biological and pathological processes [2]. Their mechanism is to compete with miRNAs for binding their regulatory sequences, as shown in Fig 1. The regulation of ceRNAs have been identified in multiple cancers [3]. Therefore, it is of importance to speed up our knowledge of the regulatory mechanisms and functions of ceRNA-miRNA triplets in the parthenogenesis of cancers or other diseases.

ceRNA principle

Fig 1. Illustration of ceRNA mediated gene regulation.

However, it is relatively time-consuming and labor-intensive to identify such ceRNA events through biological experiment. Although an increasing number of computational methods applying into high throughput miRNA and mRNA data have been developed for identifying ceRNAs, some of the prerequest (i.e. predefined threshold for genes for further analysis), limitation of the nature of data (i.e. continuous data are sensitive to outliers or the sample size) and huge computational cost existed in the current related tools may not effectively and accurately identify ceRNA events.

Therefore, we present a novel rank-based algorithm considering the contribution of miRNA expression in a ceRNA binding event and extending the pairwise correlation approach to identify ceRNA-miRNA triplets. In this package, we also included several downstream analyses to further interpret the biological meaning of identified ceRNA events for its users.

General Workflow

The main pipeline of ceRNAR contains three components, including: (1) data preprocessing; (2) identification of ceRNA-miRNA triplets; (3) downstream analyses, as illustrated in Fig 2. In order to reduce the computational time and cost, this package only processed on the interactions between mRNA and miRNA that have been predicted/experimentally validated from night well-established databases for identifing ceRNA events.

ceRNAR architecture

Fig 2. A overview of the entire of ceRNAR package.

Specifically, there are three main steps for identification of ceRNA-miRNA triplets: (1) the ceRNApairFiltering method is to adopt a sliding window approach and a running sum statistics approach to identify the correlation patterns of the two target genes within a specific range of expression values for the miRNA [4]; (2) the SegmentClustering method is to group the samples showing high correlation between the expression levels of the two target genes into one single cluster for detecting the possible signals of ceRNA-miRNA triplets [5]; (3) the PeakMerging method is designed to prevent smashed segments resulting from noise for identfing the most important ceRNA events in a given dataset (as shown in Fig 3). To characterize the biological functions and pathways of identified ceRNA pairs, six possible downstream analyses have be implemented in this package.

ceRNAR algorithm

Fig 3. Details of the algorithms used in ceRNAR package.

Data preprocessing

To illustrate functionality of ceRNAR, a small toy dataset (including gene and miRNA expression and clinical data) extracted from TCGA pan-cancer altas is used. Larger expression datasets can be obtained via either ceRNAcustomize() to load user's own data or ceRNAcustomize() to load data from TCGA. Once importing the package, the toy dataset (78 miRNA features and 185 mRNA features of 45 samples with corresponding survival information) to can be accessed via data():

library(ceRNAR)

Gene expression:

data(gene_exp)

knitr::kable(gene_exp[1:5,1:5])

miRNA expression:

data(mirna_exp)

knitr::kable(mirna_exp[1:5,1:5])

clinical data:

data(surv_data)

knitr::kable(surv_data[1:5,1:2])

Then, ceRNACustomize() is used to further preprocess and check the data

ceRNACustomize(project_name = 'demo', disease_name = 'DLBC', gene_exp = gene_exp,  mirna_exp = mirna_exp, surv_data = surv_data)

Alternatively, ceRNATCGA() can be used to retrieve TCGA data

ceRNATCGA(project_name = 'TCGA', disease_name = 'DLBC')

Putative miRNA-gene interaction

Here, ceRNAR comes with 2 experimentally validated miRNA-target databases (miRTarBase and miRecords[6]) and 7 computationally predicted miRNA-target databases DIANA-micro T-CDS [7], EIMMO [8], miRDB, miRanda, PITA, RNA22 and TargetScan [9] for only considering genes with shared miRNAs. Three options for filtering putative miRNA-gene pairs:

strict: experimentally validated by 2 databases and predicted by 7 databases
moderate: experimentally validated by 2 databases or predicted by 7 databases
less: experimentally validated by 2 databases or predicted by more than 6 databases

ceRNAputativePairs(project_name = 'demo', disease_name = 'DLBC', filtering = 'less')

ceRNA interaction

ceRNApairFilering(project_name = 'demo', disease_name = "DLBC", window_size = 10)
cernar_result <- SegmentClusteringPlusPeakMerging(project_name = 'demo', disease_name = "DLBC", window_size = 10)

Alternatively, ceRNAMethod() which encapsulated above two functions can be used:

cernar_result <- ceRNAMethod(project_name = 'demo', disease_name = 'DLBC', window_size = 10)

The final output stores the most important ceRNA events identified by the package:

knitr::kable(cernar_result[1:5,])

Downstream analyses

Functional analysis

This function allows users to explore the underlying biological pathways that the most important ceRNAs in a specific dataset/cohort involved. The entire analysis is based on two databases: Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO).

ceFunction_results <- ceRNAFunction(project_name = 'demo', disease_name = 'DLBC', pairs_cutoff = 1)

ceFunction_results[[1]]

ceFunction_results[[2]]

Location plot

This function helps users to visualize the expression level of a specific miRNA when modulated by a specific ceRNA.

ceRNALocation(project_name = 'demo', disease_name = 'DLBC', mirna = 'hsa-miR-101-3p', window_size = 10)

Survival analysis

This function allows users to indicate whether the ceRNAs in the discovered interaction module are associated with the survival of patients. This depends on whether the survival information is provided in the beginning.

survplot_results <- ceRNASurvival(project_name = 'demo', disease_name = 'DLBC', mirnas = 'hsa-miR-101-3p')

survplot_results[[1]]

Network analysis

This function helps users to visualize the ceRNA modules that identified by the package from their interaction networks.

network_results <- ceRNAModule(project_name = 'demo', disease_name = 'DLBC', pairs_cutoff = 5, column_sum = 1)

network_results[[1]]

External validation

This function helps users to further check whether the findings have been validated based on the miRSponge database.

external_val_result <- ceRNAValidate(project_name = 'demo', disease_name = 'DLBC')

knitr::kable(external_val_result[1:5,1:8], row.names = FALSE)

Data integration

This function allows users to further combine results obtained from ceRNAR with other state-of-the-art tools in ceRNA biological field, including SPONGE [10] and GDCRNATools [11]. (Note: Unfortunately, we discarded JAMI R package for this part in final version because this package is not presented on either CRAN or Bioconductor. But we kept our source code on github.)

library(SPONGE)
integrated_result <- ceRNAIntegrate(project_name = 'demo', disease_name = 'DLBC')

knitr::kable(integrated_result[1:5,], row.names = FALSE)

ceRNAR is available at:

R-package: https://github.com/ywhsiao/ceRNAR

Citation

To cite your use of ceRNAR in your publication, please reference:

Hsiao, Y. W., Wang, L., & Lu, T. P. (2022). ceRNAR: An R package for identification and analysis of ceRNA-miRNA triplets. PLOS Computational Biology, 18(9), e1010497. doi:10.1371/journal.pcbi.1010497

References

[1] Bartel DP. MicroRNAs: target recognition and regulatory functions. cell. 2009;136(2):215-33.

[2] Salmena L, Poliseno L, Tay Y, Kats L, Pandolfi PP. A ceRNA hypothesis: the Rosetta Stone of a hidden RNA language? Cell. 2011;146(3):353-8.

[3] Tay Y, Kats L, Salmena L, Weiss D, Tan SM, Ala U, et al. Coding-independent regulation of the tumor suppressor PTEN by competing endogenous mRNAs. Cell. 2011;147(2):344-57.

[4] Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences. 2005;102(43):15545-50.

[5] Olshen AB, Venkatraman E, Lucito R, Wigler M. Circular binary segmentation for the analysis of array‐based DNA copy number data. Biostatistics. 2004;5(4):557-72.

[6] Xiao F, Zuo Z, Cai G, Kang S, Gao X, Li T. miRecords: an integrated resource for microRNA–target interactions. Nucleic acids research. 2009;37(suppl_1):D105-D10.

[7] Paraskevopoulou MD, Georgakilas G, Kostoulas N, Vlachos IS, Vergoulis T, Reczko M, et al. DIANA-microT web server v5. 0: service integration into miRNA functional analysis workflows. Nucleic acids research. 2013;41(W1):W169-W73.

[8] Gaidatzis D, van Nimwegen E, Hausser J, Zavolan M. Inference of miRNA targets using evolutionary conservation and pathway analysis. BMC bioinformatics. 2007;8(1):1-22.

[9] Agarwal V, Bell GW, Nam J-W, Bartel DP. Predicting effective microRNA target sites in mammalian mRNAs. elife. 2015;4:e05005.

[10] List M, Dehghani Amirabad A, Kostka D, Schulz MH. Large-scale inference of competing endogenous RNA networks with sparse partial correlation. Bioinformatics. 2019;35(14):i596-i604.

[11] Li R, Qu H, Wang S, Wei J, Zhang L, Ma R, et al. GDCRNATools: an R/Bioconductor package for integrative analysis of lncRNA, miRNA and mRNA data in GDC. Bioinformatics. 2018;34(14):2515-7.