beta_kr: Fit the K.R Model

View source: R/beta_kr.R

beta_krR Documentation

Fit the K.R Model

Description

A beta mixture model for identifying differentially methylated CpG sites between R DNA sample types collected from N patients.

Usage

beta_kr(data, M = 3, N, R, parallel_process = FALSE, seed = NULL)

Arguments

data

A dataframe of dimension C \times NR containing methylation values for C CpG sites from R sample types collected from N patients. Samples are grouped together in the dataframe such that the columns are ordered as Sample1_Patient1, Sample1_Patient2, Sample2_Patient1, Sample2_Patient2, etc.

M

Number of methylation states to be identified.

N

Number of patients in the study.

R

Number of sample types collected from each patient for study.

parallel_process

The "TRUE" option results in parallel processing of the models for increased computational efficiency. The default option has been set as "FALSE" due to package testing limitations.

seed

Seed to allow for reproducibility (default = NULL).

Details

The K.R model allows identification of the differentially methylated CpG sites between the R DNA sample types collected from each of N patients. As each CpG site in a DNA sample can belong to one of M methylation states, there can be K=M^R methylation state changes between R DNA sample types. The shape parameters vary for each DNA sample type but are constrained to be equal for each patient. An initial clustering using k-means is performed to identify K clusters. The resulting clustering solution is provided as starting values to the Expectation-Maximisation algorithm. A digamma approximation is used to obtain the maximised parameters in the M-step.

Value

A list containing:

  • cluster_size - The total number of CpG sites in each of the K clusters.

  • llk - A vector containing the log-likelihood value at each step of the EM algorithm.

  • alpha - The first shape parameter for the beta mixture model.

  • delta - The second shape parameter for the beta mixture model.

  • tau - The estimated mixing proportion for each cluster.

  • z - A matrix of dimension C \times K containing the posterior probability of each CpG site belonging to each of the K clusters.

  • classification - The classification corresponding to z, i.e. map(z).

  • uncertainty - The uncertainty of each CpG site's clustering.

  • DM - The AUC and WD metric for distribution similarity in each cluster.

See Also

betaclust

Examples

my.seed <- 190
M <- 3
N <- 4
R <- 2
data_output = beta_kr(pca.methylation.data[1:30,2:9], M, N, R,
                      parallel_process = FALSE, seed = my.seed)

betaclust documentation built on Sept. 30, 2024, 9:30 a.m.