knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-" )
This project contains analysis scripts used in the manuscript "An Evaluation of Supervised Methods for Identifying Differentially Methylated Regions in Illumina Methylation Arrays" by Mallik et al (2018). The reference files for all functions in this project is in /docs/DMRcompare.pdf
.
/inst/1_downloader2.5.R
betaVals_mat
, which is beta value matrix for selected methylation samples. This file has rows = cpg ids, columns = sample ids. An example file is at /data/betaVals_mat.csv
.The A-clustering algorithm described in Sofer et al. (2011) (PMID: 23990415) was used to identify clusters of adjacent CpGs.
/inst/1_Aclust_data_import.R
betaVals_mat
: a beta value matrix of all CpGs on the arraycpgLocation_df
: an annotation file that indicates locations of CpGs. This file has rows = cpg ids, columns = chromosome, location. An example file is at /data/cpgLocation_df.csv
. startEndCPG_df
, which is beta value matrix for clusters of CpGs. This file has rows = cpg ids, columns = cluster number, chr, start of cluster, end of cluster, sample ids. An example file is at /data/startEndCpG_df.csv
.There are three main steps in the simulation study. See /docs/DMRcompare.pdf
for details of each function.
a) Simulate differentially methylated clusters of CpGs.
- File: SimulateData()
in script file R/2_simulatedata.R
- Main Input: betaVals_mat
(beta values for all probes on the array), startEndCpG_df
(file that indicates clusters of CpGs), treatment effects to be added to the clusters (e.g. delta = c(0.025, 0.05, 0.1, 0.15, 0.2, 0.3, 0.4) in our study)
- Main Output: simulated beta value matrix for all probes on the array, where treatment effects were added to 500 randomly selected clusters of CpGs.
b) Apply DMR finding methods to the simulated datasets:
- Files:
- RunBumphunter()
in script file R/3_RunBumphunter.R
- RunDMRcate()
in script file R/3_RunDMRcate.R
- RunProbeLasso()
in script file R/3_RunProbeLasso.R
- The Comb-p
method was implemented in Python
. The corresponding shell script is exec/run_combp_working1.sh
- Main output: significant DMRs identified by each of the methods. These functions are called by three wrapper functions:
- WriteBumphunterResults()
in script file R/4_simulate_and_save_Bumphunter_results.R
- WriteDMRcateResults()
in script file R/4_simulate_and_save_DMRcate_results.R
- WriteProbeLassoResults()
in script file R/4_simulate_and_save_ProbeLasso_results.R
c) Summarize results of DMR finding methods:
- Files:
- ProcessBumphunterResults()
in script file R/5_read_and_summarize_Bumphunter_results.R
- ProcessDMRcateResults()
in script file R/5_read_and_summarize_DMRcate_results.R
- ProcessProbeLassoResults()
in script file R/5_read_and_summarize_ProbeLasso_results.R
- ProcessCombpResults()
in script file R/5_standardize_and_summarize_Comb-p_results.R
- Main output: These functions compare the significant DMRs identified by each method, evaluate whether they overlap with the true positive clusters where treatment effects were added, and then compute summary statistics including TP, FP, TN, FN, power, precision, median number of CpGs in significant DMRs
True Positives (TP), False Positives (FP), False Negatives (FN), Power, Precision, Area under Precision-Recall curve (AuPR), Matthews' correlation coefficient (MCC), F1 Scores (F1) and Elapsed Time (in seconds) for the different DMR detection tools based on simulation datasets:
docs/Method_compare_report_20180705.Rmd
docs/Method_compare_graphs_20180709.Rmd
BuildPRcurve()
in script file R/6_Build_Precision-Recall_Curve_List.R
PlotPRCurve()
in script file R/6_Plot_Precision-Recall_Curves.R
docs/Method_compare_graphs_20180709.Rmd
PlotOverlaps()
in script file R/6_Plot_DMR-Overlaps_Venn.R
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.