sim_read_count: Synthetic reads generator for genetic variants

View source: R/reads_simulator.R

sim_read_countR Documentation

Synthetic reads generator for genetic variants

Description

There are following steps to generate the simulated reads counts for variants in single cells: 1) given the clonal genotype and the clonal prevalence, the genotypes (i.e, the clone) of cells will be generated following a multinomial distribution. Note, one cell may contain variants from two clones when it is a doublet. 2) given the distribution of reads coverage, e.g., a matrix of read coverage from real data, (variant specific), the total reads of each variant will be generated by random sampling. Note, the missing rate is governed by this matrix. 3) the allelic frequency of each variant will be generated by following a beta distribution with parameters of mean and variance. 4) Given the genotype of a cell, if the mutation exists in a cell, the alteration read counts will be generated by a binomial distribution, parameterized the allelic frequency, sampled from step 3. 5) Given the genotype of a cell, if the mutation does not exist in a cell, the alteration read counts will be generated by a binomial distribution, parameterized by the technical error rate.

Usage

sim_read_count(
  Config,
  D,
  Psi = NULL,
  means = c(0.002, 0.45),
  vars = c(100, 1),
  wise0 = "element",
  wise1 = "variant",
  cell_num = 300,
  permute_D = FALSE,
  sample_cell = TRUE,
  doublet = 0
)

Arguments

Config

A matrix of binary values. The clone-variant configuration, which encodes the phylogenetic tree structure, and the genotype of each clone

D

A matrix of integers. Sequencing depth for N variants across x cells (ideally >100 cells). NA means 0 here.

Psi

A vector of float. The fractions of each clone. If NULL, set a uniform distribution.

means

A vector of two floats. The mean theta_1 (false positive rate) and the mean theta_2 (true positive rate).

vars

A vector of two floats. The variance of theta_1 and theta_2.

wise0

A string, the beta-binomial parameter specificity for theta0: global, variant, element.

wise1

A string, the beta-binomial parameter specificity for theta1: global, variant, element.

cell_num

A integer. The number of cells to generate.

permute_D

A Boolean value. If True permute variants in D.

sample_cell

A Boolean value. If True and M > ncol(D), sample cells.

doublet

A float between 0 and 1, the rate of doublets

Value

a list containing A_sim, a matrix for alteration reads, A_sim, a matrix for total reads, I_sim, a matrix for clonal label, H_sim, a matrix for genotype, theta0, a matrix of expected false positive rate, theta1, a matrix of expected true positive rate, theta0_binom, theta0 as binomial parameter, theta1_binom, theta0 as binomial parameter, and is_doublet, a vector of Boolean value if a cell is a doublet

Examples

data(simulation_input)
D2 <- sample_seq_depth(D_input, n_cells = 500, n_sites = nrow(tree_4clone$Z))
simu <- sim_read_count(tree_4clone$Z, D2, Psi = NULL, cell_num = 500)

PMBio/cardelino documentation built on Nov. 21, 2022, 4:52 a.m.