# hyperg: Hypergeometric Family Function In VGAM: Vector Generalized Linear and Additive Models

## Description

Family function for a hypergeometric distribution where either the number of white balls or the total number of white and black balls are unknown.

## Usage

 `1` ```hyperg(N = NULL, D = NULL, lprob = "logitlink", iprob = NULL) ```

## Arguments

 `N` Total number of white and black balls in the urn. Must be a vector with positive values, and is recycled, if necessary, to the same length as the response. One of `N` and `D` must be specified. `D` Number of white balls in the urn. Must be a vector with positive values, and is recycled, if necessary, to the same length as the response. One of `N` and `D` must be specified. `lprob` Link function for the probabilities. See `Links` for more choices. `iprob` Optional initial value for the probabilities. The default is to choose initial values internally.

## Details

Consider the scenario from `dhyper` where there are N=m+n balls in an urn, where m are white and n are black. A simple random sample (i.e., without replacement) of k balls is taken. The response here is the sample proportion of white balls. In this document, `N` is N=m+n, `D` is m (for the number of “defectives”, in quality control terminology, or equivalently, the number of marked individuals). The parameter to be estimated is the population proportion of white balls, viz. prob = m/(m+n).

Depending on which one of `N` and `D` is inputted, the estimate of the other parameter can be obtained from the equation prob = m/(m+n), or equivalently, `prob = D/N`. However, the log-factorials are computed using `lgamma` and both m and n are not restricted to being integer. Thus if an integer N is to be estimated, it will be necessary to evaluate the likelihood function at integer values about the estimate, i.e., at `trunc(Nhat)` and `ceiling(Nhat)` where `Nhat` is the (real) estimate of N.

## Value

An object of class `"vglmff"` (see `vglmff-class`). The object is used by modelling functions such as `vglm`, `vgam`, `rrvglm`, `cqo`, and `cao`.

## Warning

No checking is done to ensure that certain values are within range, e.g., k <= N.

## Note

The response can be of one of three formats: a factor (first level taken as success), a vector of proportions of success, or a 2-column matrix (first column = successes) of counts. The argument `weights` in the modelling function can also be specified. In particular, for a general vector of proportions, you will need to specify `weights` because the number of trials is needed.

Thomas W. Yee

## References

Forbes, C., Evans, M., Hastings, N. and Peacock, B. (2011). Statistical Distributions, Hoboken, NJ, USA: John Wiley and Sons, Fourth edition.

`dhyper`, `binomialff`.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21``` ```nn <- 100 m <- 5 # Number of white balls in the population k <- rep(4, len = nn) # Sample sizes n <- 4 # Number of black balls in the population y <- rhyper(nn = nn, m = m, n = n, k = k) yprop <- y / k # Sample proportions # N is unknown, D is known. Both models are equivalent: fit <- vglm(cbind(y,k-y) ~ 1, hyperg(D = m), trace = TRUE, crit = "c") fit <- vglm(yprop ~ 1, hyperg(D = m), weight = k, trace = TRUE, crit = "c") # N is known, D is unknown. Both models are equivalent: fit <- vglm(cbind(y, k-y) ~ 1, hyperg(N = m+n), trace = TRUE, crit = "l") fit <- vglm(yprop ~ 1, hyperg(N = m+n), weight = k, trace = TRUE, crit = "l") coef(fit, matrix = TRUE) Coef(fit) # Should be equal to the true population proportion unique(m / (m+n)) # The true population proportion fit@extra head(fitted(fit)) summary(fit) ```