calc_abba_baba | R Documentation |
Estimate D according to Green et al (2010) via an ABBA/BABA test for a specific set of three different populations.
calc_abba_baba(
x,
facet,
p1,
p2,
p3,
jackknife = FALSE,
jackknife_par = FALSE,
sigma = NULL
)
x |
snpRdata. Input SNP data. |
facet |
character. Categorical metadata variables by which to break up
analysis. Must contain a sample facet. See |
p1 |
character. Name of population 1, must match a category present in the provided facet. |
p2 |
character. Name of population 2, must match a category present in the provided facet. |
p3 |
character. Name of population 3, must match a category present in the provided facet. |
jackknife |
logical, default FALSE. If TRUE, block-jackknifed significance for D will be calculated with window size sigma according to any SNP facet levels. See details. |
jackknife_par |
numeric or FALSE, default FALSE. If numeric, jackknifes per SNP levels will be run with the requested number of processing threads. |
sigma |
numeric or NULL, default NULL. If jackknifes are requested, the size of the windows to block by, in kb. Should be large enough to account for linkage! |
An ABBA/BABA test according to Green et al (2010) tests the hypothesis that there are an equal number of loci where population 1 and population 2 are more closely related to population 3. The ratio of these two scenarios is given as ABBA/BABA, where: ABBA = (1 - p1)p2p3 and BABA = p1(1 - p2)p3. where p1, p2, and p3 are the derived allele frequencies in populations 1 through 3, respectively. D values are provided for both the overall comparison and within any levels of provided snp facets.
p-values for D can be calculated by a block jackknifing approach according to Maier et. al (2022) by removing SNPs from each of n genomic windows (blocks) and then calculating D for all other sites. Non-overlapping windows are "blocked" by reference to any provided SNP metadata facets (usually chromosome), each with a length equal to sigma, so providing sigma = 100 will use 100kb windows.
William Hemstrom
Maier, R. et al (2022). On the limits of fitting complex models of population history to genetic data. BioRxiv. doi: 10.1101/2022.05.08.491072
Green, ER, et al (2010). A Draft Sequence of the Neandertal Genome. Science, 328(5979), 710–722. doi: 10.1126/science.1188021
# add the ref and anc columns
# note: here these results are meaningless since they are arbitrary.
x <- stickSNPs
maf <- get.snpR.stats(x, ".base", stats = "maf")$single
snp.meta(x)$ref <- maf$major
snp.meta(x)$anc <- maf$minor
# run with jackknifing, 1000kb windows!
# Test if ASP or UPD have more geneflow with PAL...
x <- calc_abba_baba(x, "pop.chr", "ASP", "UPD", "PAL", TRUE, sigma = 1000)
get.snpR.stats(x, "pop.chr", "abba_baba") # gets the per chr results
get.snpR.stats(x, "pop", "abba_baba") # gets the overall results
# smoothed windowed averages
x <- calc_smoothed_averages(x, "pop.chr", sigma = 200, step = 200,
nk = TRUE, stats.type = "pairwise")
get.snpR.stats(x, "pop.chr", "abba_baba")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.