hwe: Hardy-Weinberg Equilibrium Test (Multiallelic, Unified...

View source: R/hwe.R

hweR Documentation

Hardy-Weinberg Equilibrium Test (Multiallelic, Unified Interface)

Description

Unified Hardy-Weinberg equilibrium (HWE) testing procedure for multiallelic loci. All input formats are internally converted to a single genotype count matrix, ensuring identical inference across representations.

Usage

hwe(
  data,
  type = c("alleles", "genotypes", "counts"),
  verbose = TRUE,
  yates.correct = FALSE,
  B = 1e+05,
  seed = 123
)

Arguments

data

genotype data in one of three formats:

  • alleles two-column allele pairs

  • genotypes integer genotype IDs (triangular encoding)

  • counts symmetric genotype count matrix

type

input format: "alleles", "genotypes", or "counts"

verbose

logical; if TRUE, prints full test output

yates.correct

logical; if TRUE applies Yates continuity correction to Pearson chi-square statistic only

B

number of Monte Carlo replicates for exact test

seed

random seed for Monte Carlo exact test

Details

Let allele frequencies be:

p_i = \frac{c_i}{2n}

Under Hardy–Weinberg equilibrium:

P_{ii} = p_i^2,\quad P_{ij} = 2p_i p_j \ (i \ne j)

Expected genotype counts:

E_{ij} = n P_{ij}

All input formats are mapped to the same genotype matrix, ensuring algebraic equivalence.

Pearson chi-square

X^2 = \sum_{i \le j} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

With optional Yates correction:

X^2 = \sum \frac{(|O_{ij} - E_{ij}| - 0.5)^2}{E_{ij}}

Likelihood ratio test

G^2 = 2 \sum_{i \le j} O_{ij} \log(O_{ij}/E_{ij})

with convention 0log(0) = 0.

Inbreeding coefficient

F = \frac{H_{obs} - H_{exp}}{1 - H_{exp}}

where:

H_{obs} = \sum_i O_{ii}/n,\quad H_{exp} = \sum_i p_i^2

Under:

X \sim \text{Multinomial}(n, P)

the p-value is:

p = \Pr(\ell(X^{sim}) \le \ell(X^{obs}))

where:

\ell(x) = \sum x_i \log p_i + \log \frac{n!}{\prod x_i!}

\text{alleles} \equiv \text{genotype IDs} \equiv \text{count matrix} \rightarrow M_{ij}

Hence all statistics are identical up to Monte Carlo variation.

Value

Named list containing:

  • source — input representation used ("alleles", "genotypes", or "counts")

  • X2 — Pearson chi-square statistic

  • p_X2 — asymptotic p-value of Pearson chi-square test

  • LRT — likelihood ratio statistic

  • p_LRT — asymptotic p-value of likelihood ratio test

  • p_exact — Monte Carlo exact test p-value

  • freq — allele frequency vector

  • rho — inbreeding coefficient

Note

Note that

  • Zero-frequency alleles are removed automatically

  • Exact test is Monte Carlo (fixed seed for reproducibility)

  • All statistics are computed on the same standardized genotype matrix

Examples

## Not run: 
a1 <- c(1,1,1,1,2,2,2,3,3,1,2,3,1,2,3,1,2,3)
a2 <- c(1,2,3,1,2,3,2,3,1,2,3,1,1,1,2,3,2,3)
r1 <- hwe(cbind(a1,a2), "alleles", FALSE)
g <- a2g(a1,a2)
r2 <- hwe(g, "genotypes", FALSE)
g_tab <- table(g)
pairs <- g2a(as.integer(names(g_tab)))
k <- max(pairs)
M <- matrix(0, k, k)
for(i in seq_len(nrow(pairs))) {
  M[pairs[i,1], pairs[i,2]] <- g_tab[i]
}
r3 <- hwe(M, "counts", FALSE)
r <- lapply(list(r1, r2, r3), \(x) within(as.data.frame(x), {
     freq <- paste(round(as.numeric(freq), 3), collapse=";")})[1,])
do.call(rbind,r)

## End(Not run)

gap documentation built on May 28, 2026, 9:07 a.m.