clone_id: Infer clonal identity of single cells
In davismcc/cardelino: Clone Identification from Single Cell Data

View source: R/clone_id.R

Clone ID

R Documentation

Infer clonal identity of single cells

Description

Infer clonal identity of single cells

Assign cells to clones using an EM algorithm

Assign cells to clones using a Gibbs sampling algorithm

Usage

clone_id(
  A,
  D,
  Config = NULL,
  n_clone = NULL,
  Psi = NULL,
  relax_Config = TRUE,
  relax_rate_fixed = NULL,
  inference = "sampling",
  n_chain = 1,
  n_proc = 1,
  verbose = TRUE,
  ...
)

clone_id_EM(
  A,
  D,
  Config,
  Psi = NULL,
  min_iter = 10,
  max_iter = 1000,
  logLik_threshold = 1e-05,
  verbose = TRUE
)

clone_id_Gibbs(
  A,
  D,
  Config,
  Psi = NULL,
  relax_Config = TRUE,
  relax_rate_fixed = NULL,
  relax_rate_prior = c(1, 9),
  keep_base_clone = TRUE,
  prior0 = c(0.2, 99.8),
  prior1 = c(0.45, 0.55),
  min_iter = 5000,
  max_iter = 20000,
  buin_frac = 0.5,
  wise = "variant",
  relabel = FALSE,
  verbose = TRUE
)

Arguments

`A`	variant x cell matrix of integers; number of alternative allele reads in variant i cell j
`D`	variant x cell matrix of integers; number of total reads covering variant i cell j
`Config`	variant x clone matrix of binary values. The clone-variant configuration, which encodes the phylogenetic tree structure. This is the output Z of Canopy
`n_clone`	integer(1), the number of clone to reconstruct. This is in use only if Config is NULL
`Psi`	A vector of float. The fractions of each clone, output P of Canopy
`relax_Config`	logical(1), If TRUE, relaxing the Clone Configuration by changing it from fixed value to act as a prior Config with a relax rate.
`relax_rate_fixed`	numeric(1), If the value is between 0 to 1, the relax rate will be set as a fix value during updating clone Config. If NULL, the relax rate will be learned automatically with relax_rate_prior.
`inference`	character(1), the method to use for inference, either "sampling" to use Gibbs sampling (default) or "EM" to use expectation-maximization (faster)
`n_chain`	integer(1), the number of chains to run, which will be averaged as an output result
`n_proc`	integer(1), the number of processors to use. This parallel computing can largely reduce time when using multiple chains
`verbose`	logical(1), should the function output verbose information as it runs?
`...`	arguments passed to `clone_id_Gibbs` or `clone_id_EM` (as appropriate)
`min_iter`	A integer. The minimum number of iterations in the Gibbs sampling. The real iteration may be longer until the convergence.
`max_iter`	A integer. The maximum number of iterations in the Gibbs sampling, even haven't passed the convergence diagnosis
`logLik_threshold`	A float. The threshold of logLikelihood increase for detecting convergence.
`relax_rate_prior`	numeric(2), the two parameters of beta prior distribution of the relax rate for relaxing the clone Configuration. This mode is used when relax_relax is NULL.
`keep_base_clone`	bool(1), if TRUE, keep the base clone of Config to its input values when relax mode is used.
`prior0`	numeric(2), alpha and beta parameters for the Beta prior distribution on the inferred false positive rate.
`prior1`	numeric(2), alpha and beta parameters for the Beta prior distribution on the inferred (1 - false negative) rate.
`buin_frac`	numeric(1), the fraction of chain as burn-in period
`wise`	A string, the wise of parameters for theta1: global, variant, element.
`relabel`	bool(1), if TRUE, relabel the samples of both Config and prob during the Gibbs sampling.

Details

The two Bernoulli components correspond to false positive and false negative rates. The two binomial components correspond to the read distributions with and without the mutation present.

Value

If inference method is "EM", a list containing theta, a vector of two floats denoting the parameters of the two components of the base model, i.e., mean of Bernoulli or binomial model given variant exists or not, prob, the matrix of posterior probabilities of each cell belonging to each clone with fitted parameters, and logLik, the log likelihood of the final parameters.

If inference method is "sampling", a list containing: theta0, the mean of sampled false positive parameter values; theta1 the mean of sampled (1 - false negative rate) parameter values; theta0_all, all sampled false positive parameter values; theta1_all, all sampled (1 - false negative rate) parameter values; element; logLik_all, log-likelihood for model for all sampled parameter sets; prob_all; prob, matrix with mean of sampled cell-clone assignment posterior probabilities (the key output of the model); prob_variant.

a list containing theta, a vector of two floats denoting the binomial rates given variant exists or not, prob, the matrix of posterior probabilities of each cell belonging to each clone with fitted parameters, and logLik, the log likelihood of the final parameters.

Author(s)

Yuanhua Huang and Davis McCarthy

Yuanhua Huang

Examples

data(example_donor)
assignments <- clone_id(A_clone, D_clone,
    Config = tree$Z,
    min_iter = 800, max_iter = 1200
)
prob_heatmap(assignments$prob)

assignments_EM <- clone_id(A_clone, D_clone,
    Config = tree$Z,
    inference = "EM"
)
prob_heatmap(assignments_EM$prob)

davismcc/cardelino documentation built on Nov. 19, 2022, 2:44 a.m.