coMethy is an R package for grouping genomic loci with similar methylation pattern (or called co-methylated loci). According to the methylation profile of genomic loci in target methylomes, co-methylation analysis was performed in combination with k-means clustering and WGCNA analysis. For each co-methylation module, PCA analysis was performed to select a subset of loci as the eigen loci representing the methylation trend of the module.
In R console,
source("http://bioconductor.org/biocLite.R")
biocLite("WGCNA")
library("devtools")
devtools::install_github("Gavin-Yinld/coMethy")
coMethy
takes the methylation profile of input genomic loci in target methylomes. A numeric matrix is needed as input with each row denoting a genomic locus, each column denoting a sample, and the corresponding entry containing the methylation level.
library("coMethy")
# get the demo dataset
file=paste(system.file(package="coMethy"),"extdata/co_methy.test.data.txt",sep='/')
meth_data <- read.table(file,sep='\t',header=T,row.names=1)
# a typical input data looks like this:
head(meth_data)
sample1 sample2 sample3 sample4 sample5 sample6
chr7_82243987_82244107 0.39 0.77 0.21 0.73 0.81 0.89
chr2_166158817_166158941 0.64 0.90 0.59 0.92 0.90 0.46
chr5_30670219_30670424 0.69 0.84 0.64 0.88 0.85 0.22
chr7_66421727_66421857 0.66 0.76 0.68 0.80 0.74 0.80
chr4_131771412_131771552 0.47 0.91 0.33 0.93 0.94 0.93
chr3_135529338_135529388 0.92 1.00 1.00 0.97 0.76 0.73
sample7 sample8 sample9
chr7_82243987_82244107 0.65 0.72 0.73
chr2_166158817_166158941 0.62 0.81 0.59
chr5_30670219_30670424 0.90 0.93 0.83
chr7_66421727_66421857 0.84 0.64 0.71
chr4_131771412_131771552 0.10 0.36 0.30
chr3_135529338_135529388 0.90 0.91 0.92
Firstly, K-means clustering analysis is adopted to divide pCSM loci into hypo/mid/hyper-methylated groups. In addition, the function pickSoftThreshold
in WGCNA
package is called to show the topological properties of the network inferred from each k-means group.
kmeans_cluster <- co_methylation_step1(meth_data)
# A file named "parameter.pdf" will be generated to show the topological properties.
# The two figures in each row represent the topological properties of the network in one k-means group.
# The X-axis represents the soft-thresholding power.
# For each k-means group, a proper soft-thresholding power is need to be choosen, to ensure the scale-free topology property of the network inferred from each k-means group and the connectivity of the genomic loci in the network.
Network construction is performed by using the blockwiseModules
function in the WGCNA
package. For each k-means group, a pair-wise correlation matrix is computed, which is converted to an adjacency matrix by raising the correlations to a power. The proper power parameter needs to be chosen by using the scale-free topology criterion in step 1. For example, the power of 16, 18, and 20 are chosen for the networks built for each k-means group, to balance the scale-free topology property and the connectivity of the genomic loci in the network.
module <- co_methylation_step2(data=meth_data,
kmeans_result=kmeans_cluster,
softPower_list=c(16,18,20),plot=T)
# By setting "plot=T", a file named "wgcna.module.pdf" will be generated to show the methylation level of the loci in each co-methylation module.
# Each grey line represents the methylation level of one genomic loci.
# The red line in each co-methylation module represents the average methylation level of this module.
PCA analysis is performed to pick a set of pCSM loci with the largest loading in PC1 as eigen-loci in each co-methylation module to represent the methylation trend in the module.
eigen_loci <- extract_eigen(methy_data=module$profile,
all_label=module$module_id,
number_of_eig=100,plot=T)
#By setting "plot=T", a file named "eigen_loci.pdf" will be generated to show the methylation level of the eigen-loci in each co-methylation module.
#Red lines in each co-methylation module represent the eigen-loci picked by PCA analysis.
coMethy is a major step in our pipeline to perform virtual methylome dissection, and the final step to dissect the methylomes is described below.
NMF analysis is used to explore the composition of the target methylomes. The methylation matrix of eigen-loci in all samples will be decomposed into a product of two matrices: one for the methylation profiles of the estimated cell types and the other for the cell-type proportions across all samples. 'MeDeCom' package is adopted to perform the NMF analysis. For details, please check (https://github.com/lutsik/MeDeCom/blob/master/vignettes/MeDeCom.md).
library(MeDeCom)
#MeDeCom requires intensive computations. The processing of this simple data matrix may take several minutes.
medecom.result<-runMeDeCom(as.matrix(eigen_loci$methy_prof), 2:5, 10^(-5:-1), NINIT=10, NFOLDS=10, ITERMAX=300, NCORES=9)
#The methylation profile of the estimated cell types and their proportions across all samples can be obtained:
profile<-getLMCs(medecom.result, K=5, lambda=0.01)
proportion<-getProportions(medecom.result, K=5, lambda=0.01)
#The number of cell compositions K could be determined by prior knowledge and regularizer shifts parameter λ could be selected via cross-validation provided in the MeDeCom package. In case that no prior knowledge of cell composition is available for the input methylomes, both k and λ may be selected via cross-validation as suggested in the MeDeCom package.
plotParameters(medecom.result)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.