Description Usage Arguments Details Value References See Also Examples
Compute integrated EHH (iHH), integrated EHHS (iES) and integrated normalized EHHS (inES) for all markers of a chromosome (or linkage group).
This function computes the statistics by a slightly different algorithm than scan_hh
: it sidesteps the calculation of EHH and EHHS values and their subsequent integration and
consequently no cut-offs relying on these values can be specified. Instead,
it computes the (full) lengths of pairwise shared haplotypes and averages them afterwords.
This function is primarily intended for the study of general properties of these statistics using simulated data.
1 2 3 4 5 6 7 8 9 10 |
haplohh |
an object of class |
phased |
logical. If |
polarized |
logical. |
maxgap |
maximum allowed gap in bp between two markers. If exceeded, further calculation of EHH(S) is stopped at the gap
(default= |
max_extend |
maximum distance in bp to extend shared haplotypes away from the focal marker.
(default |
discard_integration_at_border |
logical. If |
geometric.mean |
logical. If |
threads |
number of threads to parallelize computation |
Integrated EHH (iHH), integrated EHHS (iES) and integrated normalized EHHS (inES)
are computed for all markers of the chromosome (or linkage group). This function sidesteps
the computation of EHH and EHHS values and their stepwise integration. Instead, the length of all shared haplotypes
is computed and afterwords averaged. In the absence of missing values the
statistics are identical to those calculated by scan_hh
with settings
limehh = 0
, limehhs = 0
, lower_ehh_y_bound = 0
and interpolate = FALSE
, yet this function is faster.
Application of a cut-off is necessary for reducing the spurious signals
of selection caused by single shared haplotypes of extreme length. Hence, e.g. for human experimental data
it might be reasonable to set max_extend
to 1 or 2 Mb.
scan_hh
computes the statistics iHH_A, ihh_D and iES/inES separately,
while this function calculates them simultaneously. Hence,
if discard_integration_at_border
is set to TRUE
and the extension of shared haplotypes
reaches a border (i.e. chromosomal boundaries or a gap larger than maxgap
), this function discards
all statistics.
The handling of missing values is different, too: scan_hh
"removes" chromosomes with missing values from further calculations.
EHH and EHHS are then calculated for the remaining chromosomes which can accidentally yield an increase in EHH or EHHS.
This can not happen with scan_hh_full()
which treats each missing value of a marker
as if it were a new allele - terminating any shared haplotype, but does changing the
set of considered chromosomes. Thus, missing values
cause a faster decay of EHH(S) with function scan_hh_full()
.
The returned value is a dataframe with markers in rows and the following columns
chromosome name
position in the chromosome
sample frequency of the ancestral / major allele
sample frequency of the second-most frequent remaining allele
number of evaluated haplotypes at the focal marker for the ancestral / major allele
number of evaluated haplotypes at the focal marker for the second-most frequent remaining allele
iHH of the ancestral / major allele
iHH of the second-most frequent remaining allele
iES (used by Sabeti et al 2007)
inES (used by Tang et al 2007)
Note that in case of unphased data the evaluation is restricted to haplotypes of homozygous individuals which reduces the power to detect selection, particularly for iHS (for appropriate parameter setting see the main vignette and Klassmann et al (2020)).
Gautier, M. and Naves, M. (2011). Footprints of selection in the ancestral admixture of a New World Creole cattle breed. Molecular Ecology, 20, 3128-3143.
Klassmann, A. and Gautier, M. (2020). Detecting selection using Extended Haplotype Homozygosity-based statistics on unphased or unpolarized data (preprint). https://doi.org/10.22541/au.160405572.29972398/v1
Sabeti, P.C. et al. (2002). Detecting recent positive selection in the human genome from haplotype structure. Nature, 419, 832-837.
Sabeti, P.C. et al. (2007). Genome-wide detection and characterization of positive selection in human populations. Nature, 449, 913-918.
Tang, K. and Thornton, K.R. and Stoneking, M. (2007). A New Approach for Using Genome Scans to Detect Recent Positive Selection in the Human Genome. Plos Biology, 7, e171.
Voight, B.F. and Kudaravalli, S. and Wen, X. and Pritchard, J.K. (2006). A map of recent positive selection in the human genome. Plos Biology, 4, e72.
data2haplohh
, scan_hh
,
ihh2ihs
, ines2rsb
, ies2xpehh
1 2 3 4 5 6 7 8 9 10 | #example haplohh object (280 haplotypes, 1424 SNPs)
#see ?haplohh_cgu_bta12 for details
data(haplohh_cgu_bta12)
#using function scan_hh() with no cut-offs
scan <- scan_hh(haplohh_cgu_bta12, discard_integration_at_border = FALSE,
limehh = 0, limehhs = 0, lower_ehh_y_bound = 0, interpolate = FALSE)
#using function scan_hh_full()
scan_full <- scan_hh_full(haplohh_cgu_bta12, discard_integration_at_border = FALSE)
#both yield identical results within numerical precision
all.equal(scan, scan_full)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.