estimate_r_and_k: Function to estimate relatedness and switch rate parameters

View source: R/estimate_r_and_k.R

estimate_r_and_kR Documentation

Function to estimate relatedness and switch rate parameters

Description

Given a matrix of marker allele frequencies, a vector of inter-marker distances, and a matrix of genotype calls for a pair of haploid genotypes, estimate_r_and_k returns the maximum likelihood estimates of the relatedness parameter, r, and the switch rate parameter, k, under the HMM described in [1].

Usage

estimate_r_and_k(
  fs,
  ds,
  Ys,
  epsilon = 0.001,
  rho = 7.4 * 10^(-7),
  kinit = 50,
  rinit = 0.5,
  warn_fs = TRUE
)

Arguments

fs

Matrix of marker allele frequencies, i.e. the fts in [1]. Specifically, a m by Kmax matrix, where m is the marker count and Kmax is the maximum cardinality (per-marker allele count) observed over all m markers. If, for any t = 1,...,m, the maximum cardinality exceeds that of the t-th marker (i.e. if Kmax > Kt), then all fs[t,1:Kt] are in (0,1] and all fs[t,(Kt+1):Kmax] are zero. For example, if Kt = 2 and Kmax = 4 then fs[t,] might look like [0.3, 0.7, 0, 0].

ds

Vector of m inter-marker distances, i.e. the dts in [1]. The t-th element of the inter-marker distance vector, ds[t], contains the distance between marker t and t+1 such that ds[m] = Inf, where m is the marker count. (Note that this differs slightly from [1], where ds[t] contains the distance between marker t-1 and t). Distances between markers on different chromosomes are also considered infinite, i.e. if the chromosome of marker t+1 is not equal to the chromosome of the t-th marker, ds[t] = Inf.

Ys

Matrix of genotypes calls for a pair of simulated haploid genotypes, i.e. the Yts of the i-th and j-th haploid genotypes in [1]. Specifically, a m by 2 matrix, where m is the marker count and each column contains a haploid genotype. For all t = 1,...,m markers, alleles are enumerated 0 to Kt-1, where Kt is the cardinality (per-marker allele count) of the t-th marker. For example, if Kt = 2, both Ys[t,1] and Ys[t,2] are either 0 or 1.

epsilon

Genotyping error, i.e. ε in [1]. The genotyping error is the probability of miscalling one specific allele for another. As such, the error rate for the t-th marker, (Kt-1)ε, scales with Kt (the per-marker allele count, cardinality).

rho

Recombination rate, i.e. ρ in [1]. The recombination rate corresponds to the probability of a crossover per base pair. It is assumed constant across the genome under the HMM of [1]. Its default value corresponds to an average rate estimated for Plasmodium falciparum [2].

kinit

Switch rate parameter value used to initialise optimization of the negative loglikelihood.

rinit

Relatedness parameter value used to initialise optimization of the negative loglikelihood.

warn_fs

Logical indicating if the function should return warnings following allele frequency checks.

Value

Maximum likelihood estimates of the switch rate parameter, k, and relatedness parameter, r.

References

  1. Taylor, A.R., Jacob, P.E., Neafsey, D.E. and Buckee, C.O., 2019. Estimating relatedness between malaria parasites. Genetics, 212(4), pp.1337-1351.

  2. Miles, A., Iqbal, Z., Vauterin, P., Pearson, R., Campino, S., Theron, M., Gould, K., Mead, D., Drury, E., O'Brien, J. and Rubio, V.R., 2016. Indels, structural variation, and recombination drive genomic diversity in Plasmodium falciparum. Genome research, 26(9), pp.1288-1299.

Examples

# First stimulate some data
simulated_Ys <- simulate_Ys(fs = frequencies$Colombia, ds = markers$distances, k = 5, r = 0.25)

# Second estimate the switch rate parameter, k, and relatedness parameter, r
estimate_r_and_k(fs = frequencies$Colombia, ds = markers$distances, Ys = simulated_Ys)


artaylor85/paneljudge documentation built on March 6, 2023, 1:50 a.m.