library(ELMER.data) library(ELMER) library(DT) library(dplyr) library(BiocStyle)
The first step is the identification of differentially methylated CpGs (DMCs) carried out by function get.diff.meth
.
In the Supervised
mode, we compare the DNA methylation level of each distal CpG for all samples in Group 1 compared to all samples Group 2, using an unpaired one-tailed t-test. In the Unsupervised
mode, the samples of each group (Group 1 and Group 2) are ranked by their DNA methylation beta values for the given probe, and those samples in the lower quintile (20\% samples with the lowest methylation levels) of each group are used to identify if the probe is hypomethylated in Group 1 compared to Group 2. The reverse applies for the identification of hypermethylated probes. It is important to highlight that in the Unsupervised
mode, each probe selected may be based on a different subset the samples, and thus probe sets from multiple molecular subtypes may be represented. In the Supervised
mode, all tests are based on the same set of samples.
The 20\% is a parameter to the diff.meth
function called minSubgroupFrac
. For the unsupervised analysis, this is set to 20\% as in Yao et al. [@yao2015inferring], because we wanted to be able to detect a specific molecular subtype among samples; these subtypes often make up only a minority of samples, and 20\% was chosen as a lower bound for the purposes of statistical power (high enough sample numbers to yield t-test p-values that could overcome multiple hypotheses corrections, yet low enough to be able to capture changes in individual molecular subtypes occurring in 20\% or more of the cases.) This number can be set as an input to the diff.meth
function and should be tuned based on sample sizes in individual studies.
In the Supervised
mode, where the comparison groups are implicit in the sample set and labeled, the minSubgroupFrac
parameter is set to 100\%. An example would be a cell culture experiment with 5 replicates of the untreated cell line, and another 5 replicates that include an experimental treatment.
To identify hypomethylated DMCs, a one-tailed t-test is used to rule out the null hypothesis: $\mu_{group1} \geq \mu_{group2}$, where $\mu_{group1}$ is the mean methylation within the lowest group 1 quintile (or another percentile as specified by the minSubgroupFrac
parameter) and $\mu_{group2}$ is the mean within the lowest group 2 quintile. Raw p-values are adjusted for multiple hypothesis testing using the Benjamini-Hochberg method, and probes are selected when they had adjusted p-value less than $0.01$ (which can be configured using the pvalue
parameter). For additional stringency, probes are only selected if the methylation difference: $\Delta = \mu_{group1} - \mu_{group2}$ was greater than $0.3$. The same method is used to identify hypermethylated DMCs, except we use the upper quintile, and the opposite tail in the t-test is chosen.
[@yao2015inferring,@yao2015demystifying]
mae <- get(load("mae.rda")) sig.diff <- get.diff.meth(data = mae, group.col = "definition", group1 = "Primary solid Tumor", group2 = "Solid Tissue Normal", minSubgroupFrac = 0.2, # if supervised mode set to 1 sig.dif = 0.3, diff.dir = "hypo", # Search for hypomethylated probes in group 1 cores = 1, dir.out ="result", pvalue = 0.01)
head(sig.diff) %>% datatable(options = list(scrollX = TRUE)) # get.diff.meth automatically save output files. # - getMethdiff.hypo.probes.csv contains statistics for all the probes. # - getMethdiff.hypo.probes.significant.csv contains only the significant probes which # is the same with sig.diff # - a volcano plot with the diff mean and significance levels dir(path = "result", pattern = "getMethdiff")
group1 <- "Primary solid Tumor" group2 <- "Solid Tissue Normal" out <- readr::read_csv(dir(path = "result", pattern = "getMethdiff.hypo.probes.csv",full.names = TRUE)) TCGAbiolinks:::TCGAVisualize_volcano(x = as.data.frame(out)[,grep("Minus",colnames(out),value = T)], y = out$adjust.p, title = paste0("Volcano plot - Probes ", "hypomethylated in ", group1, " vs ", group2,"\n"), filename = NULL, label = c("Not Significant", paste0("Hypermethylated in ",group1), paste0("Hypomethylated in ",group1)), ylab = expression(paste(-Log[10], " (FDR corrected P-values) [one tailed test]")), xlab = expression(paste( "DNA Methylation difference (",beta,"-values)") ), x.cut = 0.3, y.cut = 0.01)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.