EpiCluster: Performs dimensional reduction of a DNA methylation data...

Description Usage Arguments Details Value Author(s) References Examples

View source: R/EpiCluster.R


The EpiCluster function first runs the BGNMF algorithm to perform a dimensional reduction of a DNA methylation data matrix, inferring a basis and excitation matrices. It subsequently uses RPBMM to perform an unsupervised clustering of the inferred basis matrix, thus allowing samples to be clustered. Users can specify the number of iterations (nIter), the number of latent variables (K) and a parameter (EE) which controls the sparsity of excitation matrix. NAs are NOT allowed and must be imputed beforehand. The output of EpiCluster function is the input for functions EpiDraw() and EpiAnalysis().


EpiCluster(X, nIter = 60, K = NULL, EE = 1e-04)



DNA methylation data matrix with rows labeling features (CpGs/genes) and columns labeling samples. NAs are not allowed and must be imputed beforehand. Note that although the dimensional reduction algorithm can deal with Illumina 450K DNAm data, using all 450k features may result in lengthy running times (3-4 days for 60 Iterations). Thus, it is recommended to perform some prior feature selection. Also ExpressionSet object is supported by this function. If ExpressionSet is detected, EpiCluster will automatically extract data matrix contained in ExpressionSet dataset.


Number of iterations used in the optimisation. Accoring to our analyses, 60 iterations is good enough for most situations. The default number is thus 60.


Number of latent variables used in the inference. This parameter need not be specified, in which case it is estimated using Random Matrix Theory (the EstDimRMT function from ISVA package).


EE parameter is used to control the sparsity of the excitation matrix. According to our analyses, optimal EE values range from 0.0001 to 0.0005, although this strongly depends on the characteristics of the data in question. Sparsity will decrease with a higher EE value. The default EE value is 0.0001.


We have tested this function with up to 470,000 CpGs and it works well. However, normally, users can select a subset of CpGs, e.g. according to a maximal variability criterion.


An EpiClusterResult object will be generated after all calculations are done, and totally, there are three entries in this object:


A blcTree S3 object containing all output generated by the RPBMM function.


Output generated by BGNMF function, which contains 5 values. See BGNMF function.


Clustering assignment of samples. Each cluster is labeled using an iterative scheme as in RPBMM: e.g. rL, rLL, rLR, rRR...

Note that EpiCluster.output is a S4 object which will be used in EpiDraw() and EpiAnalysis() function, so please don't change internal structure of EpiCluster.output if it's not necessary.


Yuan Tian, Zhanyu Ma, Andrew Teschendorff


Yuan T, Ma Z, Beck S, Teschendorff AE. (2015). A fast variational Bayes dimensional reduction and clustering algorithm for Epigenome-Wide Association Studies (EWAS). Under Review.


    Data <- GenSimData(Ncpg=1000,Nsig=100)
    EpiCluster.Result <- EpiCluster(Data$beta,nIter=20,EE=0.005)

JoshuaTian/EpiCluster documentation built on May 20, 2019, 10:19 p.m.