fst_buckleton: Estimates theta
In Ahhgust/MMDIT: Mitochondrial mixture project

Description Usage Arguments Details Value

Estimate's the overall theta/fst as per: doi: 10.1016/j.fsigen.2016.03.004

1	fst_buckleton(alleles, populations, nJack = 0L, approximate = FALSE)

`alleles`	(vector of strings, 1 allele per haploid individual)
`populations`	(vector of strings; population labels)
`nJack`	(number of jackknifes)
`approximate`	(boolean; treat the population as being infinite in size)

In particular, this take in a vector of strings and a vector of population labels: eg:

alleles <- c("A", "A", "G", "G") pops <- c("CEU", "CEU", "YRI", "YRI")

and it estimates Buckleton's FST It returns a vector of length nJack+1 Index 1 in the vector is the overall FST subsequent indexes are the jackknife estimates. To get a upper bound on FST try:

fst_buckleton(alleles, pops, nJack=1000, approximate=FALSE) -> fsts quantile(fsts[-1], 0.99)

for a naive estimate of 99CI FST

In general, Buckleton's estimator is perhaps a bit simple in implementation e.g., it takes simple averages over population-pairs to estimate the overall FST Also, it uses the number of pairwise differences to compute homozygosity/heterozygosity (n*(n-1)) style. The approximate option instead uses allele frequencies (which are stated as an approximation) The major distinction here is that all singletons (haplotypes/allele seen once contribute nothing to within-population homozygosity) when approximate is TRUE (they contribute 0 pairwise differences) This makes sense if, say, all alleles/haplotype are UNIQUE (FST-> 0)

It make less sense when sample sizes are small (like in the example) This would imply that large sample sizes are needed

Numeric vector of length nJack+1. overall FST is at index [[1]], nJack jackknife estimates follow

Ahhgust/MMDIT documentation built on Jan. 27, 2021, 11:48 a.m.