View source: R/sim_run_generative_model_with_TD.R
sim_run_generative_model_with_TD | R Documentation |
This function runs the generative model to simulate input sparse gamete data for rhapsodi incorporating transmission distortion through either gamete killing or gene conversion. In addition, to returning the sparse gamete data, the function also returns the fully known generated gamete data, the diploid donor phased haplotypes, and the true recombination break points for each gamete. This function also returns the identity of the SNP used for TD simulation. The following variables of the simulation can all be controlled: the number of gametes, the number of SNPs, the sequencing coverage (or missing genotype rate), the average recombination rate, whether to simulate sequencing error, the sequencing error rate to use, whether to add de novo mutations, values parameterizing how many de novo mutations there are and how many gametes are affected by the de novo mutations, the type of transmission distortion (gamete killing or gene conversion), the SNP and haplotype used for simulation of transmission distortion, the probability of a gamete with the causal SNP to undergo a gene conversion or gamete killing event, the size of gene conversion tracts, and the random seed for reproducibility
sim_run_generative_model_with_TD( num_gametes, num_snps, coverage, TD_type = "gk", p_kill = 0.5, p_convert = 0.5, converted_snp = NULL, converted_haplotype = NULL, conversion_lambda = 4, killer_snp = NULL, killer_haplotype = NULL, recomb_lambda = 1, random_seed = 42, input_cov = TRUE, input_mgr = FALSE, missing_genotype_rate = NULL, add_seq_error = TRUE, seqError_add = 0.005, add_de_novo_mut = FALSE, de_novo_lambda = 5, de_novo_alpha = 7.5, de_novo_beta = 10 )
num_gametes |
an integer, the number of gametes, or the number of columns for the sparse gamete data you want generated |
num_snps |
an integer, the number of SNPs, or the number of rows for the sparse gamete data you want generated. Note: not all of these will be heterozygous due to the coverage and therefore this number won't necessarily equal the number of SNPs following filtering at the end of the generation |
coverage |
a numeric, input if input_cov is TRUE, suggested NULL otherwise |
TD_type |
a string, if |
p_kill |
a numeric, used within gamete killing simulation, the probability that a gamete containing the SNP subject to TD will be removed from the dataset through simulated gamete killing; default = 0.5 |
p_convert |
a numeric, used within gene conversion simulation, the probability that a gamete with the TD allele will undergo a gene conversion event; default = 0.5 |
converted_snp |
an integer, used within gene conversion simulation, but not required, indicating the specific SNP which will be subject so TD. Randomly selected if not provided by the user; default = NULL |
converted_haplotype |
an integer, 0 or 1, used within gene conversion simulation, but not required, indicating which haplotype will be subject to transmission distortion. Randomly selected if not provided by the user; default = NULL |
conversion_lambda |
a numeric, used within gene conversion simulation as lambda in the Poisson distribution to determine the length of the gene conversion event; default = 4 |
killer_snp |
an integer, used within gamete killing simulation, but not required, indicating the specific SNP which will be subject to TD. Randomly selected if not provided by the user; default = NULL |
killer_haplotype |
an integer, 0 or 1, used within gamete killing simulation, but not required, indicating which haplotype will be subject to transmission distortion. Randomly selected if not provided by the user; default = NULL |
recomb_lambda |
a numeric, the average rate of recombination expected for the simulation; default = 1 |
random_seed |
an integer, the random seed which will be set for the simulation, default=42 |
input_cov |
a logical, TRUE if coverage (i.e. like 0.01 (x)) will be input rather than missing genotype rate |
input_mgr |
a logical, TRUE if missing genotype rate (i.e. like 80 (%) or 0.8) will be inpupt rather than coverage, default = FALSE |
missing_genotype_rate |
a numeric, input if input_mgr is TRUE and input_COV is FALSE, suggested NULL otherwise, default=NULL |
add_seq_error |
a logical, TRUE if you want to add sequencing error to the generated data, default=TRUE |
seqError_add |
a numeric, the sequencing error rate if adding sequencing error to the generated data, default=0.005 |
add_de_novo_mut |
a logical, TRUE if you want to add de novo mutations to the generated data, default=FALSE |
de_novo_lambda |
an integer, default=5, parameterizes a poisson distribution to find the number of de novo mutations (DNM) total |
de_novo_alpha |
a numeric, default=7.5, shape parameter for a gamma distribution to find the number of gametes affected per DNM |
de_novo_beta |
a numeric, default=10, scale parameter for a gamma distribution to find the number of gametes affected per DNM |
a list containing: generated_data a named list returning the generated input and full truth data, specifically gam_na
for the sparse rhapsodi input, gam_full
for the fully known gamete data input equivalent, recomb_spots
for the true recombination spots for each gamete, and donor_haps
for the diploid donor phased haplotypes; TD_SNP, an integer denoting the identity of the causal SNP used in TD simulations.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.