pop_het_exp: Compute the population expected heterozygosity

View source: R/pop_het_exp.R

pop_het_expR Documentation

Compute the population expected heterozygosity

Description

This function computes expected population heterozygosity (also referred to as gene diversity, to avoid the potentially misleading use of the term "expected" in this context), using the formula of Nei (1987).

Usage

pop_het_exp(
  .x,
  by_locus = FALSE,
  include_global = FALSE,
  n_cores = bigstatsr::nb_cores()
)

pop_gene_div(
  .x,
  by_locus = FALSE,
  include_global = FALSE,
  n_cores = bigstatsr::nb_cores()
)

Arguments

.x

a gen_tibble (usually grouped, as obtained by using dplyr::group_by(), otherwise the full tibble will be considered as belonging to a single population).

by_locus

boolean, determining whether Hs should be returned by locus(TRUE), or as a single genome wide value (FALSE, the default).

include_global

boolean determining whether, besides the population specific estimates, a global estimate should be appended. Note that this will return a vector of n populations plus 1 (the global value), or a matrix with n+1 columns if by_locus=TRUE.

n_cores

number of cores to be used, it defaults to bigstatsr::nb_cores()

Details

Within population expected heterozygosity (gene diversity) \hat{h}_s for a locus with m alleles is defined as:
\hat{h}_s=\tilde{n}/(\tilde{n}-1)[1-\sum_{i}^{m}\bar{\hat{x}_i^2}-\hat{h}_o/2\tilde{n}]
#nolint

where
\tilde{n}=s/\sum_k 1/n_k (i.e the harmonic mean of n_k) and
\bar{\hat{x}_i^2}=\sum_k \hat{x}_{ki}^2/s
following equation 7.39 in Nei(1987) on pp.164. In our specific case, there are only two alleles, so m=2. \hat{h}_s at the genome level for each population is simply the mean of the locus estimates for each population.

Value

a vector of mean population observed heterozygosities (if by_locus=FALSE), or a matrix of estimates by locus (rows are loci, columns are populations, by_locus=TRUE)

References

Nei M. (1987) Molecular Evolutionary Genetics. Columbia University Press

Examples



example_gt <- load_example_gt("grouped_gen_tbl")

# Compute expected heterozygosity
example_gt %>% pop_het_exp()

# To include the global expected heterozygosity, set include_global = TRUE
example_gt %>% pop_het_exp(include_global = TRUE)

# To return by locus, set by_locus = TRUE
example_gt %>% pop_het_exp(by_locus = TRUE)


tidypopgen documentation built on Aug. 28, 2025, 1:08 a.m.