calc_sfs_tests: Calculate site frequency spectrum test statistics
In rehh: Searching for Footprints of Selection using 'Extended Haplotype Homozygosity' Based Tests

Description Usage Arguments Details Value References Examples

Calculate site frequency spectrum (SFS) tests Tajima's D, Fay & Wu's H and Zeng's E.

calc_sfs_tests(
  haplohh,
  polarized = TRUE,
  window_size = NA,
  overlap = 0,
  right = TRUE,
  min_n_mrk = 1,
  verbose = TRUE
)

`haplohh`	an object of class `haplohh` (see `data2haplohh`)
`polarized`	logical. `TRUE` by default. If `FALSE`, use major and minor allele instead of ancestral and derived. If there are more than two alleles then the minor allele refers to the second-most frequent allele. Note that Tajima's D remains unchanged, Fay & Wu's H is always zero for folded spectra and Zeng's E becomes equal to Tajima's D.
`window_size`	size of sliding windows. If `NA` (default), there will be only one window covering the whole length of the chromosome.
`overlap`	size of window overlap (default 0, i.e. no overlap).
`right`	logical, indicating if the windows should be closed on the right and open on the left (default) or vice versa.
`min_n_mrk`	minimum number of (polymorphic) markers per window.
`verbose`	logical. `TRUE` by default; reports if multi-allelic sites are removed.

Neutrality tests based on the site frequency spectrum (SFS) are largely unrelated to EHH-based methods. The tests provided here are implemented elsewhere, too (e.g. in package PopGenome).

Each test compares two estimations of the scaled mutation rate theta, which all have the same expected value under neutrality. Deviations from zero indicate violations of the neutral null model, typically population size changes, population subdivision or selection. Tajima's D and Fay & Wu's H become negative in presence of an almost completed sweep, Zeng's E becomes positive for some time after it. Significance can typically be assigned only by simulations.

The standard definition of the tests cannot cope with missing values and typically markers with missing genotypes must be discarded. Ferretti (2012) provides an extension that can handle missing values (without discarding any non-missing values). In this package, only the first moments (the theta-estimators themselves) are adapted accordingly, but not the second moments (their variances), because the latter is computationally demanding and the resulting bias relatively small. It is recommended, though, to discard markers or haplotypes with more than 20% missing values.

Multi-allelic markers are always removed since the tests rely on the "infinite sites model" which implies that all polymorphic markers are bi-allelic. Monomorphic markers can be present, but are irrelevant for the tests.

A data frame with window coordinates, the number of contained (polymorphic) markers, Watterson's, Tajima's and Zeng's estimators of theta and the test statistics of Tajima's D, Fay & Wu's H and Zeng's E.

Watterson, G.A. (1975). On the number of segregating sites in genetical models without recombination. Theoretical Population Biology 7(2) 256-276.

Tajima, F. (1983). Evolutionary relationship of DNA sequences in finite populations. Genetics 105(2) 437-60.

Tajima, F. (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123(3) 585-95.

Fay, J. and Wu, C. (2000). Hitchhiking under positive Darwinian selection. Genetics 155(3) 1405-13.

Zeng, E. et al. (2006). Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics 174(3) 1431-9.

Ferretti, L. and Raineri, E. and Ramos-Onsins, S. (2012). Neutrality tests for sequences with missing data. Genetics 191(4) 1397-401.

make.example.files()
# neutral evolution
hh <- data2haplohh("example_neutral.vcf", verbose = FALSE)
calc_sfs_tests(hh)
# strong selective sweep
hh <- data2haplohh("example_sweep.vcf", verbose = FALSE)
calc_sfs_tests(hh)
remove.example.files()

rehh documentation built on Sept. 15, 2021, 5:06 p.m.

rehh index

Package overview Examples in detail Vignette for package *rehh*

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

rehh
Searching for Footprints of Selection using 'Extended Haplotype Homozygosity' Based Tests

calc_sfs_tests: Calculate site frequency spectrum test statistics
In rehh: Searching for Footprints of Selection using 'Extended Haplotype Homozygosity' Based Tests

Description

Usage

Arguments

Details

Value

References

Examples

Related to calc_sfs_tests in rehh...

R Package Documentation

Browse R Packages

We want your feedback!

rehh Searching for Footprints of Selection using 'Extended Haplotype Homozygosity' Based Tests

calc_sfs_tests: Calculate site frequency spectrum test statistics In rehh: Searching for Footprints of Selection using 'Extended Haplotype Homozygosity' Based Tests

Description

Usage

Arguments

Details

Value

References

Examples

Related to calc_sfs_tests in rehh...

R Package Documentation

Browse R Packages

We want your feedback!

rehh
Searching for Footprints of Selection using 'Extended Haplotype Homozygosity' Based Tests

calc_sfs_tests: Calculate site frequency spectrum test statistics
In rehh: Searching for Footprints of Selection using 'Extended Haplotype Homozygosity' Based Tests