simulate_Ys: Function to simulate genotype calls for a pair of haploid...

View source: R/simulate_Ys.R

simulate_YsR Documentation

Function to simulate genotype calls for a pair of haploid genotypes

Description

Given a matrix of marker allele frequencies, a vector of inter-marker distances, a relatedness parameter, and a switch rate parameter, for a pair of haploid genotypes simulate_Ys returns genotype calls simulated under the HMM described in [1].

Usage

simulate_Ys(fs, ds, k, r, epsilon = 0.001, rho = 7.4 * 10^(-7), warn_fs = TRUE)

Arguments

fs

Matrix of marker allele frequencies, i.e. the fts in [1]. Specifically, a m by Kmax matrix, where m is the marker count and Kmax is the maximum cardinality (per-marker allele count) observed over all m markers. If, for any t = 1,...,m, the maximum cardinality exceeds that of the t-th marker (i.e. if Kmax > Kt), then all fs[t,1:Kt] are in (0,1] and all fs[t,(Kt+1):Kmax] are zero. For example, if Kt = 2 and Kmax = 4 then fs[t,] might look like [0.3, 0.7, 0, 0].

ds

Vector of m inter-marker distances, i.e. the dts in [1]. The t-th element of the inter-marker distance vector, ds[t], contains the distance between marker t and t+1 such that ds[m] = Inf, where m is the marker count. (Note that this differs slightly from [1], where ds[t] contains the distance between marker t-1 and t). Distances between markers on different chromosomes are also considered infinite, i.e. if the chromosome of marker t+1 is not equal to the chromosome of the t-th marker, ds[t] = Inf.

k

Data-generating switch rate parameter, i.e. k in [1].

r

Data-generating relatedness parameter, i.e. r in [1].

epsilon

Genotyping error, i.e. ε in [1]. The genotyping error is the probability of miscalling one specific allele for another. As such, the error rate for the t-th marker, (Kt-1)ε, scales with Kt (the per-marker allele count, cardinality).

rho

Recombination rate, i.e. ρ in [1]. The recombination rate corresponds to the probability of a crossover per base pair. It is assumed constant across the genome under the HMM of [1]. Its default value corresponds to an average rate estimated for Plasmodium falciparum [2].

warn_fs

Logical indicating if the function should return warnings following allele frequency checks.

Value

Simulated genotype calls for a pair of haploid genotypes, i.e. the Yts of the i-th and j-th haploid genotypes in [1]. Specifically, a m by 2 matrix, where m is the marker count and each column contains a haploid genotype. For all t = 1,...,m markers, alleles are enumerated 0 to Kt-1, where Kt is the cardinality (per-marker allele count) of the t-th marker. For example, if Kt = 2, both Ys[t,1] and Ys[t,2] are either 0 or 1.

References

  1. Taylor, A.R., Jacob, P.E., Neafsey, D.E. and Buckee, C.O., 2019. Estimating relatedness between malaria parasites. Genetics, 212(4), pp.1337-1351.

  2. Miles, A., Iqbal, Z., Vauterin, P., Pearson, R., Campino, S., Theron, M., Gould, K., Mead, D., Drury, E., O'Brien, J. and Rubio, V.R., 2016. Indels, structural variation, and recombination drive genomic diversity in Plasmodium falciparum. Genome research, 26(9), pp.1288-1299.

Examples

simulate_Ys(fs = frequencies$Colombia, ds = markers$distances, k = 10, r = 0.5)


artaylor85/paneljudge documentation built on March 6, 2023, 1:50 a.m.