hyperg | R Documentation |
Family function for a hypergeometric distribution where either the number of white balls or the total number of white and black balls are unknown.
hyperg(N = NULL, D = NULL, lprob = "logitlink", iprob = NULL)
N |
Total number of white and black balls in the urn.
Must be a vector with positive values, and is recycled, if necessary,
to the same length as the response.
One of |
D |
Number of white balls in the urn.
Must be a vector with positive values, and is recycled, if necessary,
to the same length as the response.
One of |
lprob |
Link function for the probabilities.
See |
iprob |
Optional initial value for the probabilities. The default is to choose initial values internally. |
Consider the scenario from
dhyper
where there
are N=m+n
balls in an urn, where m
are white and n
are black. A simple random sample (i.e., without replacement) of
k
balls is taken.
The response here is the sample proportion of white balls.
In this document,
N
is N=m+n
,
D
is m
(for the number of “defectives”, in quality
control terminology, or equivalently, the number of marked individuals).
The parameter to be estimated is the population proportion of
white balls, viz. prob = m/(m+n)
.
Depending on which one of N
and D
is inputted, the
estimate of the other parameter can be obtained from the equation
prob = m/(m+n)
, or equivalently, prob = D/N
. However,
the log-factorials are computed using lgamma
and both m
and n
are not restricted to being integer.
Thus if an integer N
is to be estimated, it will be necessary to
evaluate the likelihood function at integer values about the estimate,
i.e., at trunc(Nhat)
and ceiling(Nhat)
where Nhat
is the (real) estimate of N
.
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions such as
vglm
,
vgam
,
rrvglm
,
cqo
,
and cao
.
No checking is done to ensure that certain values are within range,
e.g., k \leq N
.
The response can be of one of three formats: a factor (first
level taken as success), a vector of proportions of success,
or a 2-column matrix (first column = successes) of counts.
The argument weights
in the modelling function can also be
specified. In particular, for a general vector of proportions,
you will need to specify weights
because the number of
trials is needed.
Thomas W. Yee
Forbes, C., Evans, M., Hastings, N. and Peacock, B. (2011). Statistical Distributions, Hoboken, NJ, USA: John Wiley and Sons, Fourth edition.
dhyper
,
binomialff
.
nn <- 100
m <- 5 # Number of white balls in the population
k <- rep(4, len = nn) # Sample sizes
n <- 4 # Number of black balls in the population
y <- rhyper(nn = nn, m = m, n = n, k = k)
yprop <- y / k # Sample proportions
# N is unknown, D is known. Both models are equivalent:
fit <- vglm(cbind(y,k-y) ~ 1, hyperg(D = m), trace = TRUE, crit = "c")
fit <- vglm(yprop ~ 1, hyperg(D = m), weight = k, trace = TRUE, crit = "c")
# N is known, D is unknown. Both models are equivalent:
fit <- vglm(cbind(y, k-y) ~ 1, hyperg(N = m+n), trace = TRUE, crit = "l")
fit <- vglm(yprop ~ 1, hyperg(N = m+n), weight = k, trace = TRUE, crit = "l")
coef(fit, matrix = TRUE)
Coef(fit) # Should be equal to the true population proportion
unique(m / (m+n)) # The true population proportion
fit@extra
head(fitted(fit))
summary(fit)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.