scmet: Perform inference with scMET

View source: R/scmet.R

scmetR Documentation

Perform inference with scMET


Compute posterior of scMET model. This is the main function which infers model parameters and corrects for the mean-overdispersion relationship. The most important parameters the user should focus are X, L, user_mcmc and iter. Advanced users may want to optimise the model by changing the prior parameters. For small datasets, we recommend using MCMC implementation of scMET since it is more stable.


  X = NULL,
  L = 4,
  use_mcmc = FALSE,
  use_eb = TRUE,
  iter = 5000,
  algorithm = "meanfield",
  output_samples = 2000,
  chains = 4,
  m_wmu = rep(0, NCOL(X)),
  s_wmu = 2,
  s_mu = 1.5,
  m_wgamma = rep(0, L),
  s_wgamma = 2,
  a_sgamma = 2,
  b_sgamma = 3,
  rbf_c = 1,
  init_using_eb = TRUE,
  tol_rel_obj = 1e-04,
  n_cores = 2,
  lambda = 4,
  seed =$integer.max, 1),



Observed data (methylated reads and total reads) for each feature and cell, in a long format data.table. That is it should have 4 named columns: (Feature, Cell, total_reads, met_reads).


Covariates which might explain variability in mean (methylation). If X = NULL, then we do not perform any correction on the mean estimates. NOTE that if X is provided, rownames of X should be the unique feature names in Y. If the dimensions or all feature names do not match, an error will be thrown.


Total number of basis function to fit the mean-overdispersion trend. For L = 1, this reduces to a model that does not correct for the mean-overdispersion relationship.


Logical, whether to use the MCMC implementation for posterior inference. If FALSE, we run the VB implementation (default). For small datasets, we recommend using MCMC implementation since it is more stable.


Logical, whether to use 'Empirical Bayes' for parameter initialization. If TRUE (default), it will intialise the m_wmu and m_wgamma parameters below.


Total number of iterations, either MCMC or VB algorithm. NOTE: The STAN implementation of VB relies on black-box variational inference and potentially with relatively small sample sizes sometimes tends to 'search' around the local/global minima. We've seen that with larger sample sizes (thousands of cells), it tends to converge much faster, e.g. around 2-3k iterations.


Stan algorithm to be used by Stan. If MCMC: Possible values are: "NUTS", "HMC". If VB: Possible values are: "meanfield" and "fullrank".


If VB algorithm, the number of posterior samples to draw and save.


Total number of chains.


Prior mean of regression coefficients for covariates X.


Prior standard deviation of regression coefficients for covariates X.


Prior standard deviation for mean parameter mu.


Prior mean of regression coefficients of the basis functions.


Prior standard deviation of regression coefficients of the basis functions.


Gamma prior (shape) for standard deviation for dispersion parameter gamma.


Gamma prior (rate) for standard deviation for dispersion parameter gamma.


Scale parameter for empirically computing the variance of the RBFs.


Logical, initial values of parameters for STAN posterior inference. Preferably this should be set always to TRUE, to lower the chances of VB/MCMC initialisations being far away from posterior mass.


If VB algorithm, the convergence tolerance on the relative norm of the objective.


Total number of cores.


The penalty term to fit the RBF coefficients for the mean-overdispersion trend when initialising hyper-parameter with EB.


The seed for random number generation.


Additional parameters passed to Stan fitting functions.


An object of class scmet_mcmc or scmet_vb with the following elements:

  • posterior: A list of matrices containing the samples from the posterior. Each matrix corresponds to a different parameter returned from scMET.

  • Y: The observed data Y.

  • feature_names: A vector of feature names.

  • theta_priors: A list with all prior parameter values, for reproducibility purposes.

  • opts: A list of all additional parameters when running scMET. For reproducibility purposes.



See Also

scmet_differential, scmet_hvf_lvf


# Fit scMET (in practice 'iter' should be much larger)
obj <- scmet(Y = scmet_dt$Y, X = scmet_dt$X, L = 4, iter = 300)

andreaskapou/scMET documentation built on June 1, 2022, 11:47 p.m.