cluster_profiles_mle: Cluster methylation profiles using EM

Description Usage Arguments Value Details Author(s) See Also Examples

View source: R/cluster_profiles_mle.R

Description

General purpose functions for clustering latent profiles for different observation models using maximum likelihood estimation (MLE) and the EM algorithm. Initially, it performs parameter checking, and initializes main parameters, such as mixing proportions, basis function coefficients, then the EM algorithm is applied and finally model selection metrics are calculated, such as BIC and AIC.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
cluster_profiles_mle(
  X,
  K = 3,
  model = NULL,
  basis = NULL,
  H = NULL,
  pi_k = NULL,
  lambda = 0.5,
  beta_dispersion = 5,
  gaussian_sigma = rep(0.2, K),
  w = NULL,
  em_max_iter = 50,
  epsilon_conv = 1e-04,
  opt_method = "CG",
  opt_itnmax = 50,
  init_opt_itnmax = 30,
  is_parallel = FALSE,
  no_cores = NULL,
  is_verbose = FALSE,
  ...
)

Arguments

X

The input data, which has to be a list of elements of length N, where each element is an L X C matrix, where L are the total number of observations. The first column contains the input observations x (i.e. CpG locations). If "binomial" model then C=3, and 2nd and 3rd columns contain total number of trials and number of successes respectively. If "bernoulli" or "gaussian" model, then C=2 containing the output y (e.g. methylation level). If "beta" model, then C=3, where 2nd column contains output y and 3rd column the dispersion parameter.

K

Integer denoting the total number of clusters K.

model

Observation model name as character string. It can be either 'bernoulli', 'binomial', 'beta' or 'gaussian'.

basis

A 'basis' object. E.g. see create_basis. If NULL, will an RBF object will be created.

H

Optional, design matrix of the input data X. If NULL, H will be computed inside the function.

pi_k

Vector of length K, denoting the mixing proportions.

lambda

The complexity penalty coefficient for ridge regression.

beta_dispersion

Dispersion parameter, only used for Beta distribution and will be the same for all observations.

gaussian_sigma

Initial standard deviation of the noise term, only used when having "gaussian" observation model.

w

Optional, an (M+1)xK matrix of the initial parameters, where each column consists of the basis function coefficients for each corresponding cluster k. If NULL, will be assigned with default values.

em_max_iter

Integer denoting the maximum number of EM iterations.

epsilon_conv

Numeric denoting the convergence threshold for EM.

opt_method

The optimization method to be used. See optim for possible methods. Default is "CG".

opt_itnmax

Optional argument giving the maximum number of iterations for the corresponding method. See optim for details.

init_opt_itnmax

Optimization iterations for obtaining the initial EM parameter values.

is_parallel

Logical, indicating if code should be run in parallel.

no_cores

Number of cores to be used, default is max_no_cores - 1.

is_verbose

Logical, print results during EM iterations.

...

Additional parameters.

Value

An object of class cluster_profiles_mle_"obs_model" with the following elements:

Details

The beta regression model is based on alternative parameterization of the beta density in terms of the mean and dispersion parameter: https://cran.r-project.org/web/packages/betareg/. For modelling details for Binomial/Bernoulli observation model check the paper for BPRMeth: https://academic.oup.com/bioinformatics/article/32/17/i405/2450762 .

Author(s)

C.A.Kapourani C.A.Kapourani@ed.ac.uk

See Also

create_basis, cluster_profiles_vb infer_profiles_vb, infer_profiles_mle, infer_profiles_gibbs, create_region_object

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Example of optimizing parameters for synthetic data using 3 RBFs

basis <- create_rbf_object(M=3)
out <- cluster_profiles_mle(X = binomial_data, model = "binomial",
  basis=basis, em_max_iter = 5, opt_itnmax = 5, init_opt_itnmax=5,
  is_parallel = FALSE)

#-------------------------------------

basis <- create_rbf_object(M=3)
out <- cluster_profiles_mle(X = gaussian_data, model = "gaussian",
  basis=basis, em_max_iter = 5, opt_itnmax = 5, init_opt_itnmax=5,
  is_parallel = FALSE)

andreaskapou/BPRMeth documentation built on June 11, 2020, 10:49 p.m.