sim_run_generative_model: This function runs the generative model to simulate input...

View source: R/sim_run_generative_model.R

sim_run_generative_modelR Documentation

This function runs the generative model to simulate input sparse gamete data for rhapsodi

Description

This function runs the generative model to simulate input sparse gamete data for rhapsodi. In addition, to returning the sparse gamete data, the function also returns the fully known generated gamete data, the diploid donor phased haplotypes, and the true recombination break points for each gamete. The following variables of the simulation can all be controlled: the number of gametes, the number of SNPs, the sequencing coverage (or missing genotype rate), the average recombination rate, whether to simulate sequencing error, the sequencing error rate to use, whether to add de novo mutations, values parameterizing how many de novo mutations there are and how many gametes are affected by the de novo mutations, and the random seed for reproducibility

Usage

sim_run_generative_model(
  num_gametes,
  num_snps,
  coverage,
  recomb_lambda,
  random_seed = 42,
  input_cov = TRUE,
  input_mgr = FALSE,
  missing_genotype_rate = NULL,
  add_seq_error = TRUE,
  seqError_add = 0.005,
  add_de_novo_mut = FALSE,
  de_novo_lambda = 5,
  de_novo_alpha = 7.5,
  de_novo_beta = 10
)

Arguments

num_gametes

an integer, the number of gametes, or the number of columns for the sparse gamete data you want generated

num_snps

an integer, the number of SNPs, or the number of rows for the sparse gamete data you want generated. Note: not all of these will be heterozygous due to the coverage and therefore this number won't necessarily equal the number of SNPs following filtering at the end of the generation

coverage

a numeric, input if input_cov is TRUE, suggested NULL otherwise

recomb_lambda

a numeric, the average rate of recombination expected for the simulation

random_seed

an integer, the random seed which will be set for the simulation, default=42

input_cov

a logical, TRUE if coverage (i.e. like 0.01 (x)) will be input rather than missing genotype rate

input_mgr

a logical, TRUE if missing genotype rate (i.e. like 80 (%) or 0.8) will be inpupt rather than coverage, default = FALSE

missing_genotype_rate

a numeric, input if input_mgr is TRUE and input_COV is FALSE, suggested NULL otherwise, default=NULL

add_seq_error

a logical, TRUE if you want to add sequencing error to the generated data, default=TRUE

seqError_add

a numeric, the sequencing error rate if adding sequencing error to the generated data, default=0.005

add_de_novo_mut

a logical, TRUE if you want to add de novo mutations to the generated data, default=FALSE

de_novo_lambda

an integer, default=5, parameterizes a poisson distribution to find the number of de novo mutations (DNM) total

de_novo_alpha

a numeric, default=7.5, shape parameter for a gamma distribution to find the number of gametes affected per DNM

de_novo_beta

a numeric, default=10, scale parameter for a gamma distribution to find the number of gametes affected per DNM

Value

generated_data a named list returning the generated input and full truth data, specifically gam_na for the sparse rhapsodi input, gam_full for the fully known gamete data input equivalent, recomb_spots for the true recombination spots for each gamete, and donor_haps for the diploid donor phased haplotypes


mccoy-lab/rhapsodi documentation built on July 27, 2022, 3:56 a.m.