knitr::opts_chunk$set(echo = TRUE)
Based on DNA methylation values, the vignette shows how to estimate cell-type composition and cell type-specific methylation profiles accounting for confounder variables such as age and sex.
We provide access to a simulated matrix D
of a complex tissue DNA methylation values.
The data frame exp_grp
contains the corresponding experimental data for each patient. Here the data frame contains 22 different variables that include sex and age.
D = readRDS(url("https://zenodo.org/record/3247635/files/D.rds")) dim(D) head(D)[1:5, 1:5] exp_grp = readRDS(url("https://zenodo.org/record/3247635/files/exp_grp.rds")) dim(exp_grp) head(exp_grp)[1:5, 1:5]
In addition to cell-type composition, DNA methylation can vary because of other confounder variables, such as age, sex, and batch effects. We use the function CF_detection
to remove probes correlated with confounding factors. By default, the function assumes that all variables contained in the argument exp_grp
are potential confounders. Using a linear regression for each of the variables, the function removes probes significantly associated with confounders (false discovery threshold defined by the threshold
argument).
D_CF = medepir::CF_detection(D, exp_grp, threshold = 0.15, ncores = 2) dim(D_CF) print(paste0("Number of correlated probes removed : ", nrow(D) - nrow(D_CF)))
To choose the number $K$ of cell types, we used the function plot_k
, which performs a Principal Component Analysis and plot the eigenvalues of the probes matrix in descending order.
medepir::plot_k(D_CF)
To select the number of principal components, we use Cattell rule that recommends to keep principal components that correspond to eigenvalues to the left of the straight line. Here, Cattell rule suggests to keep 4 principal components which corresponds to 5 cell types (the number of cell types is equal to the number of principal components plus one).
This step is optional. The function feature_selection
select probes with the largest variance (5000 probes by default). By removing probes that do not vary, deconvolution routines can run much faster.
D_FS = medepir::feature_selection(D_CF) dim(D_FS)
We propose three methods of deconvolution, of three packages:
RFE
to run RefFreeEWAS::RefFreeCellMix
, MDC to run MeDeCom::runMeDeCom
and Edec to run EDec::run_edec_stage_1
.
Here we show the results obtained with RefFreeEWAS
.
results_RFE = medepir::RFE(D_FS, nbcell = 5)
Results obtained with EDec and MeDeCom can be obtained by uncommenting the following code.
#results_MDC = medepir::MDC(D_FS, nbcell = 5, lambdas = c(0, 10^(-5:-1))) #infloci = read.table(url("https://zenodo.org/record/3247635/files/inf_loci.txt"),header = FALSE, sep = "\t") #infloci = as.vector(infloci$V1) #results_Edec = medepir::Edec(D, nbcell = 5, infloci = infloci)
We can vizualize the matrix of cell-type proportions using a stacked bar plot.
proportions<-(results_RFE$A) colors <- c("#A7A7A7","#2E9FDF", "#00AFBB", "#E7B800", "#FC4E07") barplot(proportions,names.arg=1:20,col=colors)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.