sim_run_generative_model_with_TD: This function runs the generative model to simulate input...

View source: R/sim_run_generative_model_with_TD.R

sim_run_generative_model_with_TDR Documentation

This function runs the generative model to simulate input sparse gamete data for rhapsodi incorporating transmission distortion through either gamete killing or gene conversion

Description

This function runs the generative model to simulate input sparse gamete data for rhapsodi incorporating transmission distortion through either gamete killing or gene conversion. In addition, to returning the sparse gamete data, the function also returns the fully known generated gamete data, the diploid donor phased haplotypes, and the true recombination break points for each gamete. This function also returns the identity of the SNP used for TD simulation. The following variables of the simulation can all be controlled: the number of gametes, the number of SNPs, the sequencing coverage (or missing genotype rate), the average recombination rate, whether to simulate sequencing error, the sequencing error rate to use, whether to add de novo mutations, values parameterizing how many de novo mutations there are and how many gametes are affected by the de novo mutations, the type of transmission distortion (gamete killing or gene conversion), the SNP and haplotype used for simulation of transmission distortion, the probability of a gamete with the causal SNP to undergo a gene conversion or gamete killing event, the size of gene conversion tracts, and the random seed for reproducibility

Usage

sim_run_generative_model_with_TD(
  num_gametes,
  num_snps,
  coverage,
  TD_type = "gk",
  p_kill = 0.5,
  p_convert = 0.5,
  converted_snp = NULL,
  converted_haplotype = NULL,
  conversion_lambda = 4,
  killer_snp = NULL,
  killer_haplotype = NULL,
  recomb_lambda = 1,
  random_seed = 42,
  input_cov = TRUE,
  input_mgr = FALSE,
  missing_genotype_rate = NULL,
  add_seq_error = TRUE,
  seqError_add = 0.005,
  add_de_novo_mut = FALSE,
  de_novo_lambda = 5,
  de_novo_alpha = 7.5,
  de_novo_beta = 10
)

Arguments

num_gametes

an integer, the number of gametes, or the number of columns for the sparse gamete data you want generated

num_snps

an integer, the number of SNPs, or the number of rows for the sparse gamete data you want generated. Note: not all of these will be heterozygous due to the coverage and therefore this number won't necessarily equal the number of SNPs following filtering at the end of the generation

coverage

a numeric, input if input_cov is TRUE, suggested NULL otherwise

TD_type

a string, if gk, gamete killing is simulated; else any other string, gene conversion is simulated; default = "gk"

p_kill

a numeric, used within gamete killing simulation, the probability that a gamete containing the SNP subject to TD will be removed from the dataset through simulated gamete killing; default = 0.5

p_convert

a numeric, used within gene conversion simulation, the probability that a gamete with the TD allele will undergo a gene conversion event; default = 0.5

converted_snp

an integer, used within gene conversion simulation, but not required, indicating the specific SNP which will be subject so TD. Randomly selected if not provided by the user; default = NULL

converted_haplotype

an integer, 0 or 1, used within gene conversion simulation, but not required, indicating which haplotype will be subject to transmission distortion. Randomly selected if not provided by the user; default = NULL

conversion_lambda

a numeric, used within gene conversion simulation as lambda in the Poisson distribution to determine the length of the gene conversion event; default = 4

killer_snp

an integer, used within gamete killing simulation, but not required, indicating the specific SNP which will be subject to TD. Randomly selected if not provided by the user; default = NULL

killer_haplotype

an integer, 0 or 1, used within gamete killing simulation, but not required, indicating which haplotype will be subject to transmission distortion. Randomly selected if not provided by the user; default = NULL

recomb_lambda

a numeric, the average rate of recombination expected for the simulation; default = 1

random_seed

an integer, the random seed which will be set for the simulation, default=42

input_cov

a logical, TRUE if coverage (i.e. like 0.01 (x)) will be input rather than missing genotype rate

input_mgr

a logical, TRUE if missing genotype rate (i.e. like 80 (%) or 0.8) will be inpupt rather than coverage, default = FALSE

missing_genotype_rate

a numeric, input if input_mgr is TRUE and input_COV is FALSE, suggested NULL otherwise, default=NULL

add_seq_error

a logical, TRUE if you want to add sequencing error to the generated data, default=TRUE

seqError_add

a numeric, the sequencing error rate if adding sequencing error to the generated data, default=0.005

add_de_novo_mut

a logical, TRUE if you want to add de novo mutations to the generated data, default=FALSE

de_novo_lambda

an integer, default=5, parameterizes a poisson distribution to find the number of de novo mutations (DNM) total

de_novo_alpha

a numeric, default=7.5, shape parameter for a gamma distribution to find the number of gametes affected per DNM

de_novo_beta

a numeric, default=10, scale parameter for a gamma distribution to find the number of gametes affected per DNM

Value

a list containing: generated_data a named list returning the generated input and full truth data, specifically gam_na for the sparse rhapsodi input, gam_full for the fully known gamete data input equivalent, recomb_spots for the true recombination spots for each gamete, and donor_haps for the diploid donor phased haplotypes; TD_SNP, an integer denoting the identity of the causal SNP used in TD simulations.


mccoy-lab/rhapsodi documentation built on July 27, 2022, 3:56 a.m.