melissa_vb: Cluster and impute single cell methylomes using VB

Description Usage Arguments Value Details Author(s) See Also Examples

Description

melissa clusters and imputes single cells based on their methylome landscape on specific genomic regions, e.g. promoters, using the Variational Bayes (VB) EM-like algorithm.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
melissa(
  X,
  K = 3,
  basis = NULL,
  delta_0 = NULL,
  w = NULL,
  alpha_0 = 0.5,
  beta_0 = NULL,
  vb_max_iter = 300,
  epsilon_conv = 1e-05,
  is_kmeans = TRUE,
  vb_init_nstart = 10,
  vb_init_max_iter = 20,
  is_parallel = FALSE,
  no_cores = 3,
  is_verbose = TRUE
)

Arguments

X

The input data, which has to be a list of elements of length N, where N are the total number of cells. Each element in the list contains another list of length M, where M is the total number of genomic regions, e.g. promoters. Each element in the inner list is an I X 2 matrix, where I are the total number of observations. The first column contains the input observations x (i.e. CpG locations) and the 2nd columns contains the corresponding methylation level.

K

Integer denoting the total number of clusters K.

basis

A 'basis' object. E.g. see create_basis function from BPRMeth package. If NULL, will an RBF object with 3 basis functions will be created.

delta_0

Parameter vector of the Dirichlet prior on the mixing proportions pi.

w

Optional, an Mx(D)xK array of the initial parameters, where first dimension are the genomic regions M, 2nd the number of covariates D (i.e. basis functions), and 3rd are the clusters K. If NULL, will be assigned with default values.

alpha_0

Hyperparameter: shape parameter for Gamma distribution. A Gamma distribution is used as prior for the precision parameter tau.

beta_0

Hyperparameter: rate parameter for Gamma distribution. A Gamma distribution is used as prior for the precision parameter tau.

vb_max_iter

Integer denoting the maximum number of VB iterations.

epsilon_conv

Numeric denoting the convergence threshold for VB.

is_kmeans

Logical, use Kmeans for initialization of model parameters.

vb_init_nstart

Number of VB random starts for finding better initialization.

vb_init_max_iter

Maximum number of mini-VB iterations.

is_parallel

Logical, indicating if code should be run in parallel.

no_cores

Number of cores to be used, default is max_no_cores - 1.

is_verbose

Logical, print results during VB iterations.

Value

An object of class melissa with the following elements:

Details

The modelling and mathematical details for clustering profiles using mean-field variational inference are explained here: http://rpubs.com/cakapourani/ . More specifically:

Author(s)

C.A.Kapourani C.A.Kapourani@ed.ac.uk

See Also

create_melissa_data_obj, partition_dataset, plot_melissa_profiles, impute_test_met, impute_met_files, filter_regions

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Example of running Melissa on synthetic data

# Create RBF basis object with 4 RBFs
basis_obj <- BPRMeth::create_rbf_object(M = 4)

set.seed(15)
# Run Melissa
melissa_obj <- melissa(X = melissa_synth_dt$met, K = 2, basis = basis_obj,
   vb_max_iter = 10, vb_init_nstart = 1, vb_init_max_iter = 5,
   is_parallel = FALSE, is_verbose = FALSE)

# Extract mixing proportions
print(melissa_obj$pi_k)

andreaskapou/Melissa documentation built on June 12, 2020, 5:54 p.m.