This project contains analysis scripts used in the manuscript "An Evaluation of Supervised Methods for Identifying Differentially Methylated Regions in Illumina Methylation Arrays" by Mallik et al. The reference files for all functions in this project is in /docs/DMRcompare.pdf
.
/inst/1_downloader2.5.R
betaVals_mat
, which is beta value matrix for selected methylation samples. This file has rows = cpg ids, columns = sample ids. An example file is at /data/betaVals_mat.csv
.The A-clustering algorithm described in Sofer et al. (2011) was used to identify clusters of adjacent CpGs.
/inst/1_Aclust_data_import.R
betaVals_mat
: a beta value matrix of all CpGs on the arraycpgLocation_df
: an annotation file that indicates locations of CpGs. This file has rows = cpg ids, columns = chromosome, location. An example file is at /data/cpgLocation_df.csv
.startEndCPG_df
, which is beta value matrix for clusters of CpGs. This file has rows = cpg ids, columns = cluster number, chr, start of cluster, end of cluster, sample ids. An example file is at /data/startEndCpG_df.csv
.There are three main steps in the simulation study. See /docs/DMRcompare.pdf
for details of each function.
SimulateData()
in script file R/2_simulatedata.R
betaVals_mat
, startEndCpG_df
(file that indicates clusters of CpGs), treatment effects to be added to the clustersRunBumphunter()
in script file R/3_RunBumphunter.R
RunDMRcate()
in script file R/3_RunDMRcate.R
RunProbeLasso()
in script file R/3_RunProbeLasso.R
Comb-p
method was implemented in Python
. The corresponding shell script is exec/run_combp_working1.sh
WriteBumphunterResults()
in script file R/4_simulate_and_save_Bumphunter_results.R
WriteDMRcateResults()
in script file R/4_simulate_and_save_DMRcate_results.R
WriteProbeLassoResults()
in script file R/4_simulate_and_save_ProbeLasso_results.R
ProcessBumphunterResults()
in script file R/5_read_and_summarize_Bumphunter_results.R
ProcessDMRcateResults()
in script file R/5_read_and_summarize_DMRcate_results.R
ProcessProbeLassoResults()
in script file R/5_read_and_summarize_ProbeLasso_results.R
ProcessCombpResults()
in script file R/5_standardize_and_summarize_Comb-p_results.R
True Positives (TP), False Positives (FP), False Negatives (FN), Power, Precision, Area under Precision-Recall curve (AuPR), Matthews' correlation coefficient (MCC), F1 Scores (F1) and Elapsed Time (in seconds) for the different DMR detection tools based on simulation datasets:
docs/Method_compare_report_20180705.Rmd
docs/Method_compare_graphs_20180709.Rmd
BuildPRcurve()
in script file R/6_Build_Precision-Recall_Curve_List.R
PlotPRCurve()
in script file R/6_Plot_Precision-Recall_Curves.R
docs/Method_compare_graphs_20180709.Rmd
PlotOverlaps()
in script file R/6_Plot_DMR-Overlaps_Venn.R
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.