calc_abba_baba: Conduct an ABBA/BABA test for gene flow.

calc_abba_babaR Documentation

Conduct an ABBA/BABA test for gene flow.

Description

Estimate D according to Green et al (2010) via an ABBA/BABA test for a specific set of three different populations.

Usage

calc_abba_baba(
  x,
  facet,
  p1,
  p2,
  p3,
  jackknife = FALSE,
  jackknife_par = FALSE,
  sigma = NULL
)

Arguments

x

snpRdata. Input SNP data.

facet

character. Categorical metadata variables by which to break up analysis. Must contain a sample facet. See Facets_in_snpR for more details.

p1

character. Name of population 1, must match a category present in the provided facet.

p2

character. Name of population 2, must match a category present in the provided facet.

p3

character. Name of population 3, must match a category present in the provided facet.

jackknife

logical, default FALSE. If TRUE, block-jackknifed significance for D will be calculated with window size sigma according to any SNP facet levels. See details.

jackknife_par

numeric or FALSE, default FALSE. If numeric, jackknifes per SNP levels will be run with the requested number of processing threads.

sigma

numeric or NULL, default NULL. If jackknifes are requested, the size of the windows to block by, in kb. Should be large enough to account for linkage!

Details

An ABBA/BABA test according to Green et al (2010) tests the hypothesis that there are an equal number of loci where population 1 and population 2 are more closely related to population 3. The ratio of these two scenarios is given as ABBA/BABA, where: ABBA = (1 - p1)p2p3 and BABA = p1(1 - p2)p3. where p1, p2, and p3 are the derived allele frequencies in populations 1 through 3, respectively. D values are provided for both the overall comparison and within any levels of provided snp facets.

p-values for D can be calculated by a block jackknifing approach according to Maier et. al (2022) by removing SNPs from each of n genomic windows (blocks) and then calculating D for all other sites. Non-overlapping windows are "blocked" by reference to any provided SNP metadata facets (usually chromosome), each with a length equal to sigma, so providing sigma = 100 will use 100kb windows.

Author(s)

William Hemstrom

References

Maier, R. et al (2022). On the limits of fitting complex models of population history to genetic data. BioRxiv. doi: 10.1101/2022.05.08.491072

Green, ER, et al (2010). A Draft Sequence of the Neandertal Genome. Science, 328(5979), 710–722. doi: 10.1126/science.1188021

Examples


# add the ref and anc columns
# note: here these results are meaningless since they are arbitrary.
x <- stickSNPs
maf <- get.snpR.stats(x, ".base", stats = "maf")$single
snp.meta(x)$ref <- maf$major
snp.meta(x)$anc <- maf$minor

# run with jackknifing, 1000kb windows!
# Test if ASP or UPD have more geneflow with PAL...
x <- calc_abba_baba(x, "pop.chr", "ASP", "UPD", "PAL", TRUE, sigma = 1000)
get.snpR.stats(x, "pop.chr", "abba_baba") # gets the per chr results
get.snpR.stats(x, "pop", "abba_baba") # gets the overall results

# smoothed windowed averages
x <- calc_smoothed_averages(x, "pop.chr", sigma = 200, step = 200, 
   nk = TRUE, stats.type = "pairwise")
get.snpR.stats(x, "pop.chr", "abba_baba")

hemstrow/snpR documentation built on March 20, 2024, 7:03 a.m.