# Hypergeometric: The Hypergeometric Distribution

## Description

Density, distribution function, quantile function and random generation for the hypergeometric distribution.

## Usage

 1 2 3 4 dhyper(x, m, n, k, log = FALSE) phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE) qhyper(p, m, n, k, lower.tail = TRUE, log.p = FALSE) rhyper(nn, m, n, k) 

## Arguments

 x, q vector of quantiles representing the number of white balls drawn without replacement from an urn which contains both black and white balls. m the number of white balls in the urn. n the number of black balls in the urn. k the number of balls drawn from the urn, hence must be in 0,1,…, m+n. p probability, it must be between 0 and 1. nn number of observations. If length(nn) > 1, the length is taken to be the number required. log, log.p logical; if TRUE, probabilities p are given as log(p). lower.tail logical; if TRUE (default), probabilities are P[X ≤ x], otherwise, P[X > x].

## Details

The hypergeometric distribution is used for sampling without replacement. The density of this distribution with parameters m, n and k (named Np, N-Np, and n, respectively in the reference below, where N := m+n is also used in other references) is given by

p(x) = choose(m, x) choose(n, k-x) / choose(m+n, k)

for x = 0, …, k.

Note that p(x) is non-zero only for max(0, k-n) <= x <= min(k, m).

With p := m/(m+n) (hence Np = N \times p in the reference's notation), the first two moments are mean

E[X] = μ = k p

and variance

Var(X) = k p (1 - p) * (m+n-k)/(m+n-1),

which shows the closeness to the Binomial(k,p) (where the hypergeometric has smaller variance unless k = 1).

The quantile is defined as the smallest value x such that F(x) ≥ p, where F is the distribution function.

In rhyper(), if one of m, n, k exceeds .Machine\$integer.max, currently the equivalent of qhyper(runif(nn), m,n,k) is used which is comparably slow while instead a binomial approximation may be considerably more efficient.

## Value

dhyper gives the density, phyper gives the distribution function, qhyper gives the quantile function, and rhyper generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rhyper, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

## Source

dhyper computes via binomial probabilities, using code contributed by Catherine Loader (see dbinom).

phyper is based on calculating dhyper and phyper(...)/dhyper(...) (as a summation), based on ideas of Ian Smith and Morten Welinder.

qhyper is based on inversion (of an earlier phyper() algorithm).

rhyper is based on a corrected version of

Kachitvichyanukul, V. and Schmeiser, B. (1985). Computer generation of hypergeometric random variates. Journal of Statistical Computation and Simulation, 22, 127–145.

## References

Johnson, N. L., Kotz, S., and Kemp, A. W. (1992) Univariate Discrete Distributions, Second Edition. New York: Wiley.

 1 2 3 4 5 6 m <- 10; n <- 7; k <- 8 x <- 0:(k+1) rbind(phyper(x, m, n, k), dhyper(x, m, n, k)) all(phyper(x, m, n, k) == cumsum(dhyper(x, m, n, k))) # FALSE ## but error is very small: signif(phyper(x, m, n, k) - cumsum(dhyper(x, m, n, k)), digits = 3)