sc_bpr_cluster_wrap: Cluster single cells based on methylation profiles
In andreaskapou/BPRMeth-devel: Model higher-order methylation profiles

Description Usage Arguments Value Author(s)

sc_bpr_cluster_wrap is a wrapper function that clusters single-cells based on their DNA methylation profiles using the EM algorithm, where the observation model is the Binomial/Bernoulli distributed Probit Regression likelihood. Initially, it performs parameter checking, runs a 'mini' EM to fnd the optimal starting parameter values, and then the EM algorithm is applied and finally model selection metrics are calculated, such as BIC and AIC.

sc_bpr_cluster_wrap(x, K = 2, pi_k = NULL, w = NULL, basis = NULL,
  lambda = 1/8, em_max_iter = 100, epsilon_conv = 1e-05,
  use_kmeans = TRUE, em_init_nstart = 10, em_init_max_iter = 10,
  opt_method = "CG", opt_itnmax = 50, init_opt_itnmax = 100,
  is_parallel = TRUE, no_cores = NULL, is_verbose = FALSE)

`x`	A list of length I, where I are the total number of cells. Each element of the list contains another list of length N, where N is the total number of genomic regions. Each element of the inner list is an L x 2 matrix of observations, where 1st column contains the locations and the 2nd column contains the methylation level of the corresponding CpGs.
`K`	Integer denoting the number of clusters K.
`pi_k`	Vector of length K, denoting the mixing proportions.
`w`	A N x M x K array, where each column contains the basis function coefficients for the corresponding cluster.
`basis`	A 'basis' object. E.g. see `create_rbf_object`
`lambda`	The complexity penalty coefficient for ridge regression.
`em_max_iter`	Integer denoting the maximum number of EM iterations.
`epsilon_conv`	Numeric denoting the convergence parameter for EM.
`use_kmeans`	Logical, use k-means for initializing centres or randmoly picking a point a cluster centre.
`em_init_nstart`	Number of EM random starts for finding optimal likelihood.
`em_init_max_iter`	Maximum number of EM iterations for the 'small' init EM.
`opt_method`	The optimization method to be used. See `optim` for possible methods. Default is "CG".
`opt_itnmax`	Optional argument giving the maximum number of iterations for the corresponding method. See `optim` for details.
`init_opt_itnmax`	Optimization iterations for obtaining the initial EM parameter values.
`is_parallel`	Logical, indicating if code should be run in parallel.
`no_cores`	Number of cores to be used, default is max_no_cores - 2.
`is_verbose`	Logical, print results during EM iterations

A 'sc_bpr_cluster' object which, in addition to the input parameters, consists of the following variables:

pi_k: Fitted mixing proportions.
w: A N x M x K array matrix with the fitted coefficients of the basis functions for each cluster k and region n.
NLL: The Negative Log Likelihood after the EM algorithm has finished.
post_prob: Posterior probabilities of each cell belonging to each cluster.
labels: Hard clustering assignments of each cell.
BIC: Bayesian Information Criterion metric.
AIC: Akaike Information Criterion metric.
ICL: Integrated Complete Likelihood criterion metric.