compute_r_and_k_CIs: Function to compute confidence intervals for relatedness and...

View source: R/compute_r_and_k_CIs.R

compute_r_and_k_CIsR Documentation

Function to compute confidence intervals for relatedness and switch rate parameters

Description

Given a matrix of marker allele frequencies, a vector of inter-marker distances, and estimates of the relatedness and switch rate parameters, compute_r_and_k_CIs returns confidence intervals around the parameter estimates. The default confidence is 95%. The intervals are approximate. They are generated using parametric bootstrap draws of the parameter estimates based on genotype calls for haploid genotype pairs simulated under the HMM described in [1] using the input parameter estimates. The quality of the approximation and compute time increases with the number of parametric bootstrap draws, which are generated in parallel using a specified number of cores.

Usage

compute_r_and_k_CIs(
  fs,
  ds,
  khat,
  rhat,
  confidence = 95,
  nboot = 100,
  core_count = parallel::detectCores() - 1,
  warn_fs = TRUE,
  ...
)

Arguments

fs

Matrix of marker allele frequencies, i.e. the fts in [1]. Specifically, a m by Kmax matrix, where m is the marker count and Kmax is the maximum cardinality (per-marker allele count) observed over all m markers. If, for any t = 1,...,m, the maximum cardinality exceeds that of the t-th marker (i.e. if Kmax > Kt), then all fs[t,1:Kt] are in (0,1] and all fs[t,(Kt+1):Kmax] are zero. For example, if Kt = 2 and Kmax = 4 then fs[t,] might look like [0.3, 0.7, 0, 0].

ds

Vector of m inter-marker distances, i.e. the dts in [1]. The t-th element of the inter-marker distance vector, ds[t], contains the distance between marker t and t+1 such that ds[m] = Inf, where m is the marker count. (Note that this differs slightly from [1], where ds[t] contains the distance between marker t-1 and t). Distances between markers on different chromosomes are also considered infinite, i.e. if the chromosome of marker t+1 is not equal to the chromosome of the t-th marker, ds[t] = Inf.

khat

Estimate of the switch rate parameter, i.e. estimate of k in [1].

rhat

Estimate of the relatedness parameter, i.e. estimate of r in [1].

confidence

Confidence level (percentage) of the confidence interval (default 95%).

nboot

Number of parametric bootstrap draws from which to compute the confidence interval. Larger values provide a better approximation but prolong computation.

core_count

Number of cores to use to do computation. Set to 2 or more for parallel computation. Defaults to the number detected on the machine minus one.

warn_fs

Logical indicating if the function should return warnings following allele frequency checks.

...

Arguments to be passed to simulate_Ys and estimate_r_and_k.

Value

Confidence intervals around input switch rate parameter, k, and relatedness parameter, r.

References

  1. Taylor, A.R., Jacob, P.E., Neafsey, D.E. and Buckee, C.O., 2019. Estimating relatedness between malaria parasites. Genetics, 212(4), pp.1337-1351.

Examples

# First, stimulate some data
simulated_Ys <- simulate_Ys(fs = frequencies$Colombia, ds = markers$distances, k = 5, r = 0.25)

# Second, estimate the switch rate parameter, k, and relatedness parameter, r
krhat <- estimate_r_and_k(fs = frequencies$Colombia, ds = markers$distances, Ys = simulated_Ys)

# Third, compute confidence intervals (CIs)
compute_r_and_k_CIs(fs = frequencies$Colombia, ds = markers$distances, khat = krhat['khat'], rhat = krhat['rhat'])


artaylor85/paneljudge documentation built on March 6, 2023, 1:50 a.m.