DNA methylation can be used to identify functional changes at transcriptional enhancers and other cis-regulatory modules (CRMs) in tumors and other primary disease tissues. Our R/Bioconductor package
r BiocStyle::Biocpkg("ELMER") (Enhancer Linking by Methylation/Expression Relationships) provides a systematic approach that reconstructs gene regulatory networks (GRNs) by combining methylation and gene expression data derived from the same set of samples.
r BiocStyle::Biocpkg("ELMER") uses methylation changes at CRMs as the central hub of these networks, using correlation analysis to associate them with both upstream master regulator (MR) transcription factors and downstream target genes.
This package can be easily applied to TCGA public available cancer data sets and custom DNA methylation and gene expression data sets.
ELMER analyses have 5 main steps:
The package workflow is showed in the figure below:
| | ELMER Version 1 | ELMER Version 2 | |--------------------------------|:----------------------------------------------|:--------------------------------------------------------------------| | Primary data structure | mee object (custom data structure) | MAE object (Bioconductor data structure) | | Auxiliary data | Manually created | Programmatically created | | Number of human TFs | 1,982 | 1,639 (curated list from Lambert, Samuel A., et al.) | | Number of TF motifs | 91 | 771 (HOCOMOCO v11 database) | | TF classification | 78 families | 82 families and 331 subfamilies \newline(TFClass database, HOCOMOCO) | | Analysis performed | Normal vs tumor samples | Group 1 vs group 2 | | Statistical grouping | Unsupervised only | Unsupervised or supervised using labeled groups | | TCGA data source | The Cancer Genome Atlas (TCGA) (not available) | The NCI's Genomic Data Commons (GDC) | | Genome of reference | GRCh37 (hg19) | GRCh37 (hg19)/GRCh38 (hg38) | | DNA methylation platforms | HM450 | EPIC and HM450 | | Graphical User Interface (GUI) | None | TCGAbiolinksGUI | | Automatic report | None | HTML summarizing results | | Annotations | None | StateHub |
In ELMER v2 we introduce a new concept, the algorithm
mode that can be either
In the unsupervised mode (described in ELMER v1), it is assumed that one of the two groups is a heterogeneous mix of different (sometimes unknown) molecular phenotypes. For instance, in the example of Breast Cancer, normal breast tissues (Group A) are relatively homogenous, whereas Breast tumors fall into multiple molecular subtypes.
The assumption of the Unsupervised mode is that methylation changes may be restricted to a subset of one or more molecular subtypes, and thus only be present in a fraction of the samples in the test group. For instance, methylation changes related to estrogen signaling may only be present in LuminalA or LuminalB subtypes.
When this structure is unknown, the Unsupervised mode is the appropriate model, since it only requires changes in a subset of samples (by default, 20%). In contrast, in the Supervised mode, it is assumed that each group represents a more homogenous molecular phenotype, and thus we compare all samples in Group A vs. all samples in Group B. This can be used in the case of direct comparison of tumor subtypes (i.e. Luminal vs. Basal-like tumors), but can also be used in numerous other situations, including sorted cells of different types, or treated vs. untreated samples in perturbation experiments.
To install this package from github (development version), start R and enter:
devtools::install_github(repo = "tiagochst/ELMER.data") devtools::install_github(repo = "tiagochst/ELMER")
To install this package from Bioconductor start R and enter:
if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("ELMER")
Then, to load ELMER enter:
library(ELMER, quietly = TRUE)
If you used ELMER package or its results, please cite:
If you get TCGA data using
getTCGA function, please cite TCGAbiolinks package:
Silva, TC, A Colaprico, C Olsen, F D’Angelo, G Bontempi, M Ceccarelli, and H Noushmehr. 2016. “TCGA Workflow: Analyze Cancer Genomics and Epigenomics Data Using Bioconductor Packages [Version 2; Referees: 1 Approved, 1 Approved with Reservations].” F1000Research 5 (1542). doi:10.12688/f1000research.8923.2.
Grossman, Robert L., et al. "Toward a shared vision for cancer genomic data." New England Journal of Medicine 375.12 (2016): 1109-1112.
If you get use the Graphical user interface, please cite
If you have questions, wants to report a bug, please use our github repository: http://www.github.com/tiagochst/ELMER
TCGA-BRCA reports (paper supplemental material) can be found at https://tiagochst.github.io/ELMER_supplemental/
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.