Hyperintersection: The Hypergeometric Intersection Family of Distributions

Description Usage Arguments Details Value References Examples

Description

The Hypergeometric Intersection Family of Distributions

Usage

1
2
3
4
5
6
7
dhint(n, A, q = 0, range = NULL, approx = FALSE, log = FALSE, verbose = TRUE)

phint(n, A, q = 0, vals, upper.tail = TRUE, log.p = FALSE)

qhint(p, n, A, q = 0, upper.tail = TRUE, log.p = FALSE)

rhint(num = 5, n, A, q = 0)

Arguments

n

An integer specifying the number of categories in the urns.

A

A vector of integers specifying the numbers of balls drawn from each urn. The length of the vector equals the number of urns.

q

An integer specifying the number of categories in the second urn which have duplicate members. If q is 0 (default) then the symmetrical, singleton case is computed, otherwise the asymmetrical, duplicates case is computed (see Details).

range

A vector of integers specifying the intersection sizes for which probabilities (dhint) or cumulative probabilites (phint) should be computed (can be a single number). If range is NULL (default) then probabilities will be returned over the entire range of possible values.

approx

Logical. If TRUE, a binomial approximation will be used to generate the distribution.

log

Logical. If TRUE, probabilities p are given as log(p). Defaults to FALSE.

verbose

Logical. If TRUE, progress of calculation in the asymmetric, duplicates case is printed to the screen.

vals

A vector of integers specifying the intersection sizes for which probabilities (dhint) or cumulative probabilites (phint) should be computed (can be a single number). If range is NULL (default) then probabilities will be returned over the entire range of possible values.

upper.tail

Logical. If TRUE, probabilities are P(X >= c), else P(X <= c). Defaults to TRUE.

log.p

Logical. If TRUE, probabilities p are given as log(p). Defaults to FALSE.

p

A probability between 0 and 1.

num

An integer specifying the number of random numbers to generate. Defaults to 5.

Details

The hypergeometric intersection distributions describe the distribution of intersection sizes when sampling without replacement from two separate urns in which reside balls belonging to the same n object categories. In the simplest case when there is exactly one ball in each category in each urn (symmetrical, singleton case), then the distribution is hypergeometric:

P(X=v) = (choose(a,v)*choose(n-a,b-v))/choose(n,b)

When there are three urns, the distribution is given by

P(X=v) = choose(a,v) sum_i choose(a-v,i)*choose(n-a,b-v-i)*choose(n-v-i,c-v)/choose(n,b)*choose(n,c)

If, however, we allow duplicates in q <= n of the categories in the second urn, then the distribution of intersection sizes is described by the following variant of the hypergeometric:

P(X=v) = sum_m sum_l sum_j choose(n-q,v-l)*choose(q,l)*choose(q-l,m)*choose(n-v-q+l,a-v-m)*choose(l,j)*choose(n+q-a-m-j,b-v)/ choose(n,a)*choose(n+q,b)

Value

'dhint', 'phint', and 'qhint' return a data frame with two columns: v, the intersection size, and p, the associated p-values. 'rhint' returns an integer vector of random samples based on the hypergeometric intersection distribution.

References

Kalinka, A. T. (2013). The probability of drawing intersections: extending the hypergeometric distribution. arXiv.1305.0717

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
## Generate the distribution of intersections sizes without duplicates:
dd <- dhint(20, c(10, 12))
## Restrict the range of intersections.
dd <- dhint(20, c(10, 12), range = 0:5)
## Allow duplicates in q of the categories in the second urn:
dd <- dhint(35, c(15, 11), 22, verbose = FALSE)
## Generate cumulative probabilities.
pp <- phint(29, c(15, 8), vals = 5)
pp <- phint(29, c(15, 8), vals = 2, upper.tail = FALSE)
pp <- phint(29, c(15, 8), 23, vals = 2)
## Extract quantiles:
qq <- qhint(0.15, 23, c(12, 10))
qq <- qhint(0.15, 23, c(12, 10), 18)
## Generate random samples from Hypergeometric intersection distributions.
rr <- rhint(num = 10, 18, c(9, 14))
rr <- rhint(num = 10, 22, c(11, 17), 12)

Example output

   Calculating probabilities... 0.000%
   Calculating probabilities... 12.500%
   Calculating probabilities... 25.000%
   Calculating probabilities... 37.500%
   Calculating probabilities... 50.000%
   Calculating probabilities... 62.500%
   Calculating probabilities... 75.000%
   Calculating probabilities... 87.500%
   Calculating probabilities... 100.000%

   Calculating probabilities... 0.000%
   Calculating probabilities... 10.000%
   Calculating probabilities... 20.000%
   Calculating probabilities... 30.000%
   Calculating probabilities... 40.000%
   Calculating probabilities... 50.000%
   Calculating probabilities... 60.000%
   Calculating probabilities... 70.000%
   Calculating probabilities... 80.000%
   Calculating probabilities... 90.000%
   Calculating probabilities... 100.000%

   Calculating probabilities... 0.000%
   Calculating probabilities... 10.000%
   Calculating probabilities... 20.000%
   Calculating probabilities... 30.000%
   Calculating probabilities... 40.000%
   Calculating probabilities... 50.000%
   Calculating probabilities... 60.000%
   Calculating probabilities... 70.000%
   Calculating probabilities... 80.000%
   Calculating probabilities... 90.000%
   Calculating probabilities... 100.000%

   Calculating probabilities... 0.000%
   Calculating probabilities... 9.091%
   Calculating probabilities... 18.182%
   Calculating probabilities... 27.273%
   Calculating probabilities... 36.364%
   Calculating probabilities... 45.455%
   Calculating probabilities... 54.545%
   Calculating probabilities... 63.636%
   Calculating probabilities... 72.727%
   Calculating probabilities... 81.818%
   Calculating probabilities... 90.909%
   Calculating probabilities... 100.000%

hint documentation built on Feb. 2, 2022, 5:10 p.m.