fst_buckleton: Estimates theta

Description Usage Arguments Details Value

View source: R/RcppExports.R

Description

Estimate's the overall theta/fst as per: doi: 10.1016/j.fsigen.2016.03.004

Usage

1
fst_buckleton(alleles, populations, nJack = 0L, approximate = FALSE)

Arguments

alleles

(vector of strings, 1 allele per haploid individual)

populations

(vector of strings; population labels)

nJack

(number of jackknifes)

approximate

(boolean; treat the population as being infinite in size)

Details

In particular, this take in a vector of strings and a vector of population labels: eg:

alleles <- c("A", "A", "G", "G") pops <- c("CEU", "CEU", "YRI", "YRI")

and it estimates Buckleton's FST It returns a vector of length nJack+1 Index 1 in the vector is the overall FST subsequent indexes are the jackknife estimates. To get a upper bound on FST try:

fst_buckleton(alleles, pops, nJack=1000, approximate=FALSE) -> fsts quantile(fsts[-1], 0.99)

for a naive estimate of 99CI FST

In general, Buckleton's estimator is perhaps a bit simple in implementation e.g., it takes simple averages over population-pairs to estimate the overall FST Also, it uses the number of pairwise differences to compute homozygosity/heterozygosity (n*(n-1)) style. The approximate option instead uses allele frequencies (which are stated as an approximation) The major distinction here is that all singletons (haplotypes/allele seen once contribute nothing to within-population homozygosity) when approximate is TRUE (they contribute 0 pairwise differences) This makes sense if, say, all alleles/haplotype are UNIQUE (FST-> 0)

It make less sense when sample sizes are small (like in the example) This would imply that large sample sizes are needed

Value

Numeric vector of length nJack+1. overall FST is at index [[1]], nJack jackknife estimates follow


Ahhgust/MMDIT documentation built on Jan. 27, 2021, 11:48 a.m.