Description Usage Arguments Details Value References Examples
View source: R/calc_sfs_tests.R
Calculate site frequency spectrum (SFS) tests Tajima's D, Fay & Wu's H and Zeng's E.
1 2 3 4 5 6 7 8 9 | calc_sfs_tests(
haplohh,
polarized = TRUE,
window_size = NA,
overlap = 0,
right = TRUE,
min_n_mrk = 1,
verbose = TRUE
)
|
haplohh |
an object of class |
polarized |
logical. |
window_size |
size of sliding windows. If |
overlap |
size of window overlap (default 0, i.e. no overlap). |
right |
logical, indicating if the windows should be closed on the right and open on the left (default) or vice versa. |
min_n_mrk |
minimum number of (polymorphic) markers per window. |
verbose |
logical. |
Neutrality tests based on the site frequency spectrum (SFS) are largely unrelated to EHH-based methods. The tests provided here are implemented elsewhere, too (e.g. in package PopGenome).
Each test compares two estimations of the scaled mutation rate theta, which all have the same expected value under neutrality. Deviations from zero indicate violations of the neutral null model, typically population size changes, population subdivision or selection. Tajima's D and Fay & Wu's H become negative in presence of an almost completed sweep, Zeng's E becomes positive for some time after it. Significance can typically be assigned only by simulations.
The standard definition of the tests cannot cope with missing values and typically markers with missing genotypes must be discarded. Ferretti (2012) provides an extension that can handle missing values (without discarding any non-missing values). In this package, only the first moments (the theta-estimators themselves) are adapted accordingly, but not the second moments (their variances), because the latter is computationally demanding and the resulting bias relatively small. It is recommended, though, to discard markers or haplotypes with more than 20% missing values.
Multi-allelic markers are always removed since the tests rely on the "infinite sites model" which implies that all polymorphic markers are bi-allelic. Monomorphic markers can be present, but are irrelevant for the tests.
A data frame with window coordinates, the number of contained (polymorphic) markers, Watterson's, Tajima's and Zeng's estimators of theta and the test statistics of Tajima's D, Fay & Wu's H and Zeng's E.
Watterson, G.A. (1975). On the number of segregating sites in genetical models without recombination. Theoretical Population Biology 7(2) 256-276.
Tajima, F. (1983). Evolutionary relationship of DNA sequences in finite populations. Genetics 105(2) 437-60.
Tajima, F. (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123(3) 585-95.
Fay, J. and Wu, C. (2000). Hitchhiking under positive Darwinian selection. Genetics 155(3) 1405-13.
Zeng, E. et al. (2006). Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics 174(3) 1431-9.
Ferretti, L. and Raineri, E. and Ramos-Onsins, S. (2012). Neutrality tests for sequences with missing data. Genetics 191(4) 1397-401.
1 2 3 4 5 6 7 8 | make.example.files()
# neutral evolution
hh <- data2haplohh("example_neutral.vcf", verbose = FALSE)
calc_sfs_tests(hh)
# strong selective sweep
hh <- data2haplohh("example_sweep.vcf", verbose = FALSE)
calc_sfs_tests(hh)
remove.example.files()
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.