| hwe | R Documentation |
Unified Hardy-Weinberg equilibrium (HWE) testing procedure for multiallelic loci. All input formats are internally converted to a single genotype count matrix, ensuring identical inference across representations.
hwe(
data,
type = c("alleles", "genotypes", "counts"),
verbose = TRUE,
yates.correct = FALSE,
B = 1e+05,
seed = 123
)
data |
genotype data in one of three formats:
|
type |
input format: "alleles", "genotypes", or "counts" |
verbose |
logical; if TRUE, prints full test output |
yates.correct |
logical; if TRUE applies Yates continuity correction to Pearson chi-square statistic only |
B |
number of Monte Carlo replicates for exact test |
seed |
random seed for Monte Carlo exact test |
Let allele frequencies be:
p_i = \frac{c_i}{2n}
Under Hardy–Weinberg equilibrium:
P_{ii} = p_i^2,\quad P_{ij} = 2p_i p_j \ (i \ne j)
Expected genotype counts:
E_{ij} = n P_{ij}
All input formats are mapped to the same genotype matrix, ensuring algebraic equivalence.
Pearson chi-square
X^2 = \sum_{i \le j} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}
With optional Yates correction:
X^2 = \sum \frac{(|O_{ij} - E_{ij}| - 0.5)^2}{E_{ij}}
Likelihood ratio test
G^2 = 2 \sum_{i \le j} O_{ij} \log(O_{ij}/E_{ij})
with convention 0log(0) = 0.
Inbreeding coefficient
F = \frac{H_{obs} - H_{exp}}{1 - H_{exp}}
where:
H_{obs} = \sum_i O_{ii}/n,\quad H_{exp} = \sum_i p_i^2
Under:
X \sim \text{Multinomial}(n, P)
the p-value is:
p = \Pr(\ell(X^{sim}) \le \ell(X^{obs}))
where:
\ell(x) = \sum x_i \log p_i + \log \frac{n!}{\prod x_i!}
\text{alleles} \equiv \text{genotype IDs} \equiv \text{count matrix}
\rightarrow M_{ij}
Hence all statistics are identical up to Monte Carlo variation.
Named list containing:
source — input representation used ("alleles", "genotypes", or "counts")
X2 — Pearson chi-square statistic
p_X2 — asymptotic p-value of Pearson chi-square test
LRT — likelihood ratio statistic
p_LRT — asymptotic p-value of likelihood ratio test
p_exact — Monte Carlo exact test p-value
freq — allele frequency vector
rho — inbreeding coefficient
Note that
Zero-frequency alleles are removed automatically
Exact test is Monte Carlo (fixed seed for reproducibility)
All statistics are computed on the same standardized genotype matrix
## Not run:
a1 <- c(1,1,1,1,2,2,2,3,3,1,2,3,1,2,3,1,2,3)
a2 <- c(1,2,3,1,2,3,2,3,1,2,3,1,1,1,2,3,2,3)
r1 <- hwe(cbind(a1,a2), "alleles", FALSE)
g <- a2g(a1,a2)
r2 <- hwe(g, "genotypes", FALSE)
g_tab <- table(g)
pairs <- g2a(as.integer(names(g_tab)))
k <- max(pairs)
M <- matrix(0, k, k)
for(i in seq_len(nrow(pairs))) {
M[pairs[i,1], pairs[i,2]] <- g_tab[i]
}
r3 <- hwe(M, "counts", FALSE)
r <- lapply(list(r1, r2, r3), \(x) within(as.data.frame(x), {
freq <- paste(round(as.numeric(freq), 3), collapse=";")})[1,])
do.call(rbind,r)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.