Description Usage Arguments Details Value Examples
clonealign
assigns single cells (measured with RNAseq) to their clones of origin, where
the clones have been inferred from ultrashallow scDNAseq and collated into copy number profiles.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22  clonealign(
gene_expression_data,
copy_number_data,
max_iter = 200,
rel_tol = 1e06,
gene_filter_threshold = 0,
learning_rate = 0.1,
x = NULL,
clone_allele = NULL,
cov = NULL,
ref = NULL,
fix_alpha = FALSE,
dtype = "float32",
saturate = TRUE,
saturation_threshold = 6,
K = NULL,
mc_samples = 1,
verbose = TRUE,
initial_shrink = 5,
clone_call_probability = 0.95,
data_init_mu = TRUE
)

gene_expression_data 
A matrix of gene counts or a

copy_number_data 
A matrix or data frame of copy number calls for each clone.
See 
max_iter 
Maximum number of Variational Bayes iterations to perform 
rel_tol 
Relative tolerance (change in ELBO per iteration in percent) below which the inference is considered converged 
gene_filter_threshold 
Genes with total counts below or equal to this threshold will be filtered out (removes genes with no counts by default) 
learning_rate 
The learning rate to be passed to the Adam optimizer 
x 
An optional vector of covariates, e.g. corresponding to batch or patient. Can be a vector of a single covariate or a sample by covariate matrix. Note this should not contain an intercept. 
clone_allele 
A clonebyvariant matrix of copy numbers for each variant 
cov 
A cellbyvariant matrix of coverage counts 
ref 
A cellbyvariant matrix of reference allele counts 
fix_alpha 
Should the underlying priors for clone frequencies be fixed? Default TRUE (values are inferred from the data) 
dtype 
The dtype for tensorflow useage, either "float32" or "float64" 
saturate 
Should the CNVexpression relationship saturate above copy number = 
saturation_threshold 
If 
K 
The dimensionality of the expression latent space. If left 
mc_samples 
The number of Monte Carlo samples to use to estimate the ELBO 
verbose 
Should warnings and EM convergence information be printed? Default TRUE 
initial_shrink 
The strength with which the variational parameters for clone assignments are
initially shrunk towards the most likely assignments. See 
clone_call_probability 
The probability above which a cell is assigned to a clone. If no clone has probability greater than this value, then the clone is "unassigned". 
data_init_mu 
Should the mu parameters be initialized using the data? (This typically speeds up convergence) 
Input format
gene_expression_data
must either be a SingleCellExperiment
or SummarizedExperiment
with a counts
assay
representing raw gene expression counts, or a cell by gene matrix of raw counts.
copy_number_data
must either be a matrix
, data.frame
or DataFrame
with a
row for each gene in gene_expression_data
and a column for each of the clones.
If colnames(copy_number_data)
is not NULL
then these names will be used for each of
the clones in the final output.
Size factors
If size_factors == "fixed"
, the size factors will be set to the overall library size per cell
(total number of reads mapping to the cell).
If size_factors == "infer"
, the size factors will be treated as a model paramter and jointly
optimized during inference.
Otherwise, size_factors
can be a numeric vector of precomputed, custom size factors.
Recommended parameter settings
As with any probabilistic model there are many parameters to set. Through comprehensive simulations regarding the robustness of the model to misspecification (ie what's the minimum proportion of genes for which the CNVexpression relationship can be true and our inferences still valid) we have come up with the following guidelines for parameter settings, reflected in the default values:
Number of ADAM iterations  if set to 1 we essentially perform gradient descent on the marginal loglikelihood which empircally appears to have the best performance
Dispersions should be clonespecific with weak shrinkage (sigma
= 1 appears best)
The generating probabilities should be fixed to be a priori equal (this corresponds to setting alpha = TRUE
)
The cell size factors are best fixed in advanced by multiplying the total counts of whatever genes are passed to clonealign by the edgeR (TMM) normalization factors
Controlling Variational inference
Inference is performed using reparametrizationgradient variational inference. Convergence is monitored via changes
to the evidence lower bound (ELBO)  this is controlled using the
rel_tol
parameter. When the difference between the new and old ELBOs normalized
by the absolute value of the old falls below rel_tol
, the algorithm is considered converged.
The maximum number of iterations to acheive this is set using the max_iter
parameter.
In each step, maximization is performed using Adam, with learning rate given by learning_rate
.
An object of class clonealign_fit
. The maximum likelihood estimates of the
clone assignment paramters are in the clone
slot. Maximum likelihood estimates of
all model parameters are in the ml_params
slot.
1 2 3 4 5 6  library(SingleCellExperiment)
data(example_sce)
copy_number_data < rowData(example_sce)[,c("A", "B", "C")]
cal < clonealign(example_sce, copy_number_data)
print(cal)
clones < cal$clone

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.