Clone ID | R Documentation |
Infer clonal identity of single cells
Assign cells to clones using an EM algorithm
Assign cells to clones using a Gibbs sampling algorithm
clone_id( A, D, Config = NULL, n_clone = NULL, Psi = NULL, relax_Config = TRUE, relax_rate_fixed = NULL, inference = "sampling", n_chain = 1, n_proc = 1, verbose = TRUE, ... ) clone_id_EM( A, D, Config, Psi = NULL, min_iter = 10, max_iter = 1000, logLik_threshold = 1e-05, verbose = TRUE ) clone_id_Gibbs( A, D, Config, Psi = NULL, relax_Config = TRUE, relax_rate_fixed = NULL, relax_rate_prior = c(1, 9), keep_base_clone = TRUE, prior0 = c(0.2, 99.8), prior1 = c(0.45, 0.55), min_iter = 5000, max_iter = 20000, buin_frac = 0.5, wise = "variant", relabel = FALSE, verbose = TRUE )
A |
variant x cell matrix of integers; number of alternative allele reads in variant i cell j |
D |
variant x cell matrix of integers; number of total reads covering variant i cell j |
Config |
variant x clone matrix of binary values. The clone-variant configuration, which encodes the phylogenetic tree structure. This is the output Z of Canopy |
n_clone |
integer(1), the number of clone to reconstruct. This is in use only if Config is NULL |
Psi |
A vector of float. The fractions of each clone, output P of Canopy |
relax_Config |
logical(1), If TRUE, relaxing the Clone Configuration by changing it from fixed value to act as a prior Config with a relax rate. |
relax_rate_fixed |
numeric(1), If the value is between 0 to 1, the relax rate will be set as a fix value during updating clone Config. If NULL, the relax rate will be learned automatically with relax_rate_prior. |
inference |
character(1), the method to use for inference, either "sampling" to use Gibbs sampling (default) or "EM" to use expectation-maximization (faster) |
n_chain |
integer(1), the number of chains to run, which will be averaged as an output result |
n_proc |
integer(1), the number of processors to use. This parallel computing can largely reduce time when using multiple chains |
verbose |
logical(1), should the function output verbose information as it runs? |
... |
arguments passed to |
min_iter |
A integer. The minimum number of iterations in the Gibbs sampling. The real iteration may be longer until the convergence. |
max_iter |
A integer. The maximum number of iterations in the Gibbs sampling, even haven't passed the convergence diagnosis |
logLik_threshold |
A float. The threshold of logLikelihood increase for detecting convergence. |
relax_rate_prior |
numeric(2), the two parameters of beta prior distribution of the relax rate for relaxing the clone Configuration. This mode is used when relax_relax is NULL. |
keep_base_clone |
bool(1), if TRUE, keep the base clone of Config to its input values when relax mode is used. |
prior0 |
numeric(2), alpha and beta parameters for the Beta prior distribution on the inferred false positive rate. |
prior1 |
numeric(2), alpha and beta parameters for the Beta prior distribution on the inferred (1 - false negative) rate. |
buin_frac |
numeric(1), the fraction of chain as burn-in period |
wise |
A string, the wise of parameters for theta1: global, variant, element. |
relabel |
bool(1), if TRUE, relabel the samples of both Config and prob during the Gibbs sampling. |
The two Bernoulli components correspond to false positive and false negative rates. The two binomial components correspond to the read distributions with and without the mutation present.
If inference method is "EM", a list containing theta
, a vector of
two floats denoting the parameters of the two components of the base model,
i.e., mean of Bernoulli or binomial model given variant exists or not,
prob
, the matrix of posterior probabilities of each cell belonging to
each clone with fitted parameters, and logLik
, the log likelihood of
the final parameters.
If inference method is "sampling", a list containing: theta0
, the mean
of sampled false positive parameter values; theta1
the mean of sampled
(1 - false negative rate) parameter values; theta0_all
, all sampled
false positive parameter values; theta1_all
, all sampled (1 - false
negative rate) parameter values; element
; logLik_all
,
log-likelihood for model for all sampled parameter sets; prob_all
;
prob
, matrix with mean of sampled cell-clone assignment posterior
probabilities (the key output of the model); prob_variant
.
a list containing theta
, a vector of two floats denoting the
binomial rates given variant exists or not, prob
, the matrix of
posterior probabilities of each cell belonging to each clone with fitted
parameters, and logLik
, the log likelihood of the final parameters.
Yuanhua Huang and Davis McCarthy
Yuanhua Huang
data(example_donor) assignments <- clone_id(A_clone, D_clone, Config = tree$Z, min_iter = 800, max_iter = 1200 ) prob_heatmap(assignments$prob) assignments_EM <- clone_id(A_clone, D_clone, Config = tree$Z, inference = "EM" ) prob_heatmap(assignments_EM$prob)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.