EpiMethEx Analysis

DNA methylation is an epigenetic mechanism of genomic regulation involved in the maintenance of homeostatic balance. Dysregulation of DNA methylation status is one of the driver alterations occurring in neoplastic transformation and cancer progression. The identification of methylation hotspot associated to gene dysregulation may contribute to discover new prognostic and diagnostic biomarker, as well as, new therapeutic target.

We present EpiMethEx (Epigenetic Methylation and Expression), a R package to perform a large-scale integrated analysis by cyclic correlation analyses between methylation and gene expression data. For each gene, samples are segmented according to the expression levels to select genes that are differentially expressed. This stratification allows to identify CG methylation probesets modulated among gene-stratified samples. Subsequently, the methylation probesets are clustered by their relative position in gene sequence to identify wide genomic methylation events statically related to genetic modulation. The simulation study showed that the global methylation analysis was in agreement with scientific literature. In particular, this analysis revealed a negative association between promoter hypomethylation and overexpression in a wide number of genes. Less frequently, this overexpression was sustained by intragenic hypermethylation events.

Use Case

Before to execute the analysis is necessary to create three dataframes (gene expression data, methylation data, annotation data). The example below is a extract of the datasets TCGA-SKCM:

## Expressions data Frame
    Expressions <- data.frame('sample' = c("ARHGEF10L", "HIF3A", "RNF10"),
    'TCGA-YD-A89C-06' = c(-0.746592469762, -0.753826336325, 0.4953280),
    'TCGA-Z2-AA3V-06' = c(0.578807530238, -2.30662633632, 0.1023280),
    'TCGA-EB-A3Y6-01' = c(-0.363492469762, -2.67922633632, -0.6147720),
    'TCGA-EE-A3JA-06' = c(-2.97279246976, -3.61932633632, 0.02932801),
    'TCGA-D9-A4Z2-01' = c(-0.128492469762, 0.679073663675, 0.4017280),
    'TCGA-D3-A51G-06' = c(-0.4299925, -4.0626263, -1.0136720),
    stringsAsFactors=FALSE)

## Methylation data Frame
Methylation <- data.frame(
    'sample' = c("cg11663302", "cg01552731", "cg09081385"),
    'TCGA-YD-A89C-06' = c(0.9856, 0.7681, 0.0407),
    'TCGA-Z2-AA3V-06' = c(0.9863, 0.8551, 0.0244),
    'TCGA-EB-A3Y6-01' = c(0.9876, 0.6473, 0.028),
    'TCGA-EE-A3JA-06' = c(0.9826, 0.4587, 0.0343),
    'TCGA-D9-A4Z2-01' = c(0.9881, 0.8509, 0.0215),
    'TCGA-D3-A51G-06' = c(0.9774, 0.813, 0.0332),
    stringsAsFactors=FALSE)

##Annotations data Frame
Annotations <- data.frame(
    ID = c("cg11663302","cg01552731", "cg09081385"),
    Relation_to_UCSC_CpG_Island = c("Island","N_Shore","N_Shore"),
    UCSC_CpG_Islands_Name=c("chr1:18023481-18023792","chr19:46806998-46807617",
        "chr12:120972167-120972447"),
    UCSC_RefGene_Accession = c("NM_001011722","NM_152794","NM_014868"),
    Chromosome_36 = c("1","19","12"),
    Coordinate_36 = c("17896255","51498747","119456453"),
    UCSC_RefGene_Name = c("ARHGEF10L","HIF3A","RNF10"),
    UCSC_RefGene_Group =c("Body","1stExon","TSS200"),
    stringsAsFactors=FALSE)

library(EpiMethEx)
epimethex.analysis(Expressions,Annotations,Methylation,1,3,2,TRUE,TRUE,FALSE)

or use the "curatedTCGAData" package, click here to see how to install and use the package.

It's most important to remember that curatedTCGAData doesn't allow to download dataset of Annotations, therefore it must be loaded manually through csv file or created ad hoc.

Brief description of the parameters

  1. First parameter is a genes expression of data.

  2. second parameter is Annotations of data.

  3. third parameter is Methylation of data.

  4. fourth parameter and fifth parameter are the range of genes to consider.

  5. sixth parameter, is the number of cores that you can use to analysis.

  6. seventh parameter determines if genes expression of data are linear.

  7. eighth parameter determines the test to apply on expression dataset. If TRUE will apply t-student test, otherwise will apply Kolmogorov-Smirnov test.

  8. ninth parameter, determines the test to apply on methylation dataset. If TRUE will apply t-student test, otherwise will apply Kolmogorov-Smirnov test.

Final Result

The correlation analysis performed with EpiMethEx allows to obtain 4 different data matrices (.csv format) named “CG by Individually”, “CG by position”, “CG by Island” and “CG by genes” containing beta difference, mean, r correlation coefficient, p value and fold change values obtained by several statistical tests. Below are detailed the EpiMethEx data matrices:

  1. CG by individually: list of single CG probesets analyzed Individually comparing Up, Medium Down gene expression groups.

  2. CG by position: list of CG probesets grouped according to gene positions (TSS1500, TSS200, 3’UTR, 1stExon, Body and 5’UTR).

  3. CG by island: list of CG probesets grouped according to methylation Island positions and adjacent Shore and Shelf regions of each gene.

  4. CG by genes: list of methylation groups containing all CG probesets that refer to a specific gene.



giupardeb/EpiMethEx documentation built on May 28, 2019, 5:39 a.m.