sim_add_de_novo_mut: This function adds de novo mutations (DNM) to the generated...

View source: R/sim_add_de_novo_mut.R

sim_add_de_novo_mutR Documentation

This function adds de novo mutations (DNM) to the generated data

Description

This function adds de novo mutations (DNM) to the generated data, specifically picking the number of DNMs to add (using a poisson distribution), which donor haplotype the DNM originates from (using a uniform distribution) SNP indices from the diploid donor phased haplotypes after which the new DNM will be added (using a uniform distribution), how many gametes and which gametes could potentially be affected by the DNM because that SNP position originates from the affected donor haplotype, and from that how many gametes actually will be affected by each DNM (using a gamma distribution). Using this info, we construct each new row for the diploid donor haplotypes, giving the originating haplotype the alternate allele, and the other the reference allele for the full gamete data, giving unaffected gametes the reference allele, and the affected gametes the alternate allele, For the sparse gamete data, we replace genotypes with NAs for each new row using the missing genotype rate and a uniform distribution Finally, we track SNP indices and adjust any recombination breakpoints as necessary

Usage

sim_add_de_novo_mut(
  de_novo_lambda,
  de_novo_alpha,
  de_novo_beta,
  num_snps,
  num_gametes,
  gam_haps,
  gam_mat,
  gam_mat_with_na,
  donor_haps,
  unlist_ci,
  missing_genotype_rate
)

Arguments

de_novo_lambda

an integer, parameterizes a poisson distribution to find the number of DNMs total

de_novo_alpha

a numeric, shape parameter for a gamma distribution to find the maximum number of gametes affected by each DNM

de_novo_beta

a numeric, scale parameter for a gamma distribution to find the maximum number of gametes affected by each DNM

num_snps

an integer, the number of SNPs or the number of rows, the generated data had before calling this function

num_gametes

an integer, the number of gametes, or the number of columns, the generated data has

gam_haps

data matrix/frame of the hapltoypes from which each SNP in each gamete originates (encoded as 1's and 2's), necessary to find which gametes potentially could be affected by each DNM

gam_mat

full data matrix/frame of the genotypes by SNP for each gamete, encoded in 0's and 1's

gam_mat_with_na

sparse data/ matrix/frame of the genotypes by SNP for each gamete, encoded in 0's and 1's and NAs

donor_haps

a data frame with the phased diploid donor haplotypes in two columns donor1 and donor2

unlist_ci

a named vector from unlist with the crossover break points for each gamete

missing_genotype_rate

a numeric, the missing genotype rate of the simulation

Value

out a named list with the adjusted unlist_ci, num_snps, gam_haps, gam_mat, gam_mat_with_na, donor_haps, as well as the new new_rows which tracks where the new DNMs are


mccoy-lab/rhapsodi documentation built on July 27, 2022, 3:56 a.m.