scmet: Perform inference with scMET

View source: R/scmet.R

scmetR Documentation

Perform inference with scMET

Description

Compute posterior of scMET model. This is the main function which infers model parameters and corrects for the mean-overdispersion relationship. The most important parameters the user should focus are X, L, user_mcmc and iter. Advanced users may want to optimise the model by changing the prior parameters. For small datasets, we recommend using MCMC implementation of scMET since it is more stable.

Usage

scmet(
  Y,
  X = NULL,
  L = 4,
  use_mcmc = FALSE,
  use_eb = TRUE,
  iter = 5000,
  algorithm = "meanfield",
  output_samples = 2000,
  chains = 4,
  m_wmu = rep(0, NCOL(X)),
  s_wmu = 2,
  s_mu = 1.5,
  m_wgamma = rep(0, L),
  s_wgamma = 2,
  a_sgamma = 2,
  b_sgamma = 3,
  rbf_c = 1,
  init_using_eb = TRUE,
  tol_rel_obj = 1e-04,
  n_cores = 2,
  lambda = 4,
  seed = sample.int(.Machine$integer.max, 1),
  ...
)

Arguments

Y

Observed data (methylated reads and total reads) for each feature and cell, in a long format data.table. That is it should have 4 named columns: (Feature, Cell, total_reads, met_reads).

X

Covariates which might explain variability in mean (methylation). If X = NULL, then we do not perform any correction on the mean estimates. NOTE that if X is provided, rownames of X should be the unique feature names in Y. If the dimensions or all feature names do not match, an error will be thrown.

L

Total number of basis function to fit the mean-overdispersion trend. For L = 1, this reduces to a model that does not correct for the mean-overdispersion relationship.

use_mcmc

Logical, whether to use the MCMC implementation for posterior inference. If FALSE, we run the VB implementation (default). For small datasets, we recommend using MCMC implementation since it is more stable.

use_eb

Logical, whether to use 'Empirical Bayes' for parameter initialization. If TRUE (default), it will intialise the m_wmu and m_wgamma parameters below.

iter

Total number of iterations, either MCMC or VB algorithm. NOTE: The STAN implementation of VB relies on black-box variational inference and potentially with relatively small sample sizes sometimes tends to 'search' around the local/global minima. We've seen that with larger sample sizes (thousands of cells), it tends to converge much faster, e.g. around 2-3k iterations.

algorithm

Stan algorithm to be used by Stan. If MCMC: Possible values are: "NUTS", "HMC". If VB: Possible values are: "meanfield" and "fullrank".

output_samples

If VB algorithm, the number of posterior samples to draw and save.

chains

Total number of chains.

m_wmu

Prior mean of regression coefficients for covariates X.

s_wmu

Prior standard deviation of regression coefficients for covariates X.

s_mu

Prior standard deviation for mean parameter mu.

m_wgamma

Prior mean of regression coefficients of the basis functions.

s_wgamma

Prior standard deviation of regression coefficients of the basis functions.

a_sgamma

Gamma prior (shape) for standard deviation for dispersion parameter gamma.

b_sgamma

Gamma prior (rate) for standard deviation for dispersion parameter gamma.

rbf_c

Scale parameter for empirically computing the variance of the RBFs.

init_using_eb

Logical, initial values of parameters for STAN posterior inference. Preferably this should be set always to TRUE, to lower the chances of VB/MCMC initialisations being far away from posterior mass.

tol_rel_obj

If VB algorithm, the convergence tolerance on the relative norm of the objective.

n_cores

Total number of cores.

lambda

The penalty term to fit the RBF coefficients for the mean-overdispersion trend when initialising hyper-parameter with EB.

seed

The seed for random number generation.

...

Additional parameters passed to Stan fitting functions.

Value

An object of class scmet_mcmc or scmet_vb with the following elements:

  • posterior: A list of matrices containing the samples from the posterior. Each matrix corresponds to a different parameter returned from scMET.

  • Y: The observed data Y.

  • feature_names: A vector of feature names.

  • theta_priors: A list with all prior parameter values, for reproducibility purposes.

  • opts: A list of all additional parameters when running scMET. For reproducibility purposes.

Author(s)

C.A.Kapourani C.A.Kapourani@ed.ac.uk

See Also

scmet_differential, scmet_hvf_lvf

Examples

# Fit scMET (in practice 'iter' should be much larger)
obj <- scmet(Y = scmet_dt$Y, X = scmet_dt$X, L = 4, iter = 300)


andreaskapou/scMET documentation built on Feb. 1, 2024, 10:46 a.m.