View source: R/reads_simulator.R
sim_read_count | R Documentation |
There are following steps to generate the simulated reads counts for variants in single cells: 1) given the clonal genotype and the clonal prevalence, the genotypes (i.e, the clone) of cells will be generated following a multinomial distribution. Note, one cell may contain variants from two clones when it is a doublet. 2) given the distribution of reads coverage, e.g., a matrix of read coverage from real data, (variant specific), the total reads of each variant will be generated by random sampling. Note, the missing rate is governed by this matrix. 3) the allelic frequency of each variant will be generated by following a beta distribution with parameters of mean and variance. 4) Given the genotype of a cell, if the mutation exists in a cell, the alteration read counts will be generated by a binomial distribution, parameterized the allelic frequency, sampled from step 3. 5) Given the genotype of a cell, if the mutation does not exist in a cell, the alteration read counts will be generated by a binomial distribution, parameterized by the technical error rate.
sim_read_count( Config, D, Psi = NULL, means = c(0.002, 0.45), vars = c(100, 1), wise0 = "element", wise1 = "variant", cell_num = 300, permute_D = FALSE, sample_cell = TRUE, doublet = 0 )
Config |
A matrix of binary values. The clone-variant configuration, which encodes the phylogenetic tree structure, and the genotype of each clone |
D |
A matrix of integers. Sequencing depth for N variants across x cells (ideally >100 cells). NA means 0 here. |
Psi |
A vector of float. The fractions of each clone. If NULL, set a uniform distribution. |
means |
A vector of two floats. The mean theta_1 (false positive rate) and the mean theta_2 (true positive rate). |
vars |
A vector of two floats. The variance of theta_1 and theta_2. |
wise0 |
A string, the beta-binomial parameter specificity for theta0: global, variant, element. |
wise1 |
A string, the beta-binomial parameter specificity for theta1: global, variant, element. |
cell_num |
A integer. The number of cells to generate. |
permute_D |
A Boolean value. If True permute variants in D. |
sample_cell |
A Boolean value. If True and M > ncol(D), sample cells. |
doublet |
A float between 0 and 1, the rate of doublets |
a list containing A_sim
, a matrix for alteration reads,
A_sim
, a matrix for total reads, I_sim
, a matrix for clonal
label, H_sim
, a matrix for genotype, theta0
, a matrix of
expected false positive rate, theta1
, a matrix of expected true
positive rate, theta0_binom
, theta0 as binomial parameter,
theta1_binom
, theta0 as binomial parameter, and is_doublet
, a
vector of Boolean value if a cell is a doublet
data(simulation_input) D2 <- sample_seq_depth(D_input, n_cells = 500, n_sites = nrow(tree_4clone$Z)) simu <- sim_read_count(tree_4clone$Z, D2, Psi = NULL, cell_num = 500)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.