jip_approx: Approximate Joint-Inclusion Probabilities

View source: R/jip_approximations.R

jip_approxR Documentation

Approximate Joint-Inclusion Probabilities

Description

Approximations of joint-inclusion probabilities by means of first-order inclusion probabilities.

Usage

jip_approx(pik, method)

Arguments

pik

numeric vector of first-order inclusion probabilities for all population units.

method

string representing one of the available approximation methods.

Details

Available methods are "Hajek", "HartleyRao", "Tille", "Brewer1","Brewer2","Brewer3", and "Brewer4". Note that these methods were derived for high-entropy sampling designs, therefore they could have low performance under different designs.

Hájek (1964) approximation [method="Hajek"] is derived under Maximum Entropy sampling design and is given by

\tilde{\pi}_{ij} = \pi_i\pi_j \frac{1 - (1-\pi_i)(1-\pi_j)}{d}

where d = \sum_{i\in U} \pi_i(1-\pi_i)

Hartley and Rao (1962) proposed the following approximation under randomised systematic sampling [method="HartleyRao"]:

\tilde{\pi}_{ij} = \frac{n-1}{n} \pi_i\pi_j + \frac{n-1}{n^2} (\pi_i^2 \pi_j + \pi_i \pi_j^2) - \frac{n-1}{n^3}\pi_i\pi_j \sum_{i\in U} \pi_j^2

+ \frac{2(n-1)}{n^3} (\pi_i^3 \pi_j + \pi_i\pi_j^3 + \pi_i^2 \pi_j^2) - \frac{3(n-1)}{n^4} (\pi_i^2 \pi_j + \pi_i\pi_j^2) \sum_{i \in U}\pi_i^2

+ \frac{3(n-1)}{n^5} \pi_i\pi_j \biggl( \sum_{i\in U} \pi_i^2 \biggr)^2 - \frac{2(n-1)}{n^4} \pi_i\pi_j \sum_{i \in U} \pi_j^3

Tillé (1996) proposed the approximation \tilde{\pi}_{ij} = \beta_i\beta_j, where the coefficients \beta_i are computed iteratively through the following procedure [method="Tille"]:

  1. \beta_i^{(0)} = \pi_i, \,\, \forall i\in U

  2. \beta_i^{(2k-1)} = \frac{(n-1)\pi_i}{\beta^{(2k-2)} - \beta_i^{(2k-2)}}

  3. \beta_i^{2k} = \beta_i^{(2k-1)} \Biggl( \frac{n(n-1)}{(\beta^(2k-1))^2 - \sum_{i\in U} (\beta_k^{(2k-1)})^2 } \Biggr)^(1/2)

with \beta^{(k)} = \sum_{i\in U} \beta_i^{i}, \,\, k=1,2,3, \dots

Finally, Brewer (2002) and Brewer and Donadio (2003) proposed four approximations, which are defined by the general form

\tilde{\pi}_{ij} = \pi_i\pi_j (c_i + c_j)/2

where the c_i determine the approximation used:

  • Equation (9) [method="Brewer1"]:

    c_i = (n-1) / (n-\pi_i)

  • Equation (10) [method="Brewer2"]:

    c_i = (n-1) / \Bigl(n- n^{-1}\sum_{i\in U}\pi_i^2 \Bigr)

  • Equation (11) [method="Brewer3"]:

    c_i = (n-1) / \Bigl(n - 2\pi_i + n^{-1}\sum_{i\in U}\pi_i^2 \Bigr)

  • Equation (18) [method="Brewer4"]:

    c_i = (n-1) / \Bigl(n - (2n-1)(n-1)^{-1}\pi_i + (n-1)^{-1}\sum_{i\in U}\pi_i^2 \Bigr)

Value

A symmetric matrix of inclusion probabilities, which diagonal is the vector of first-order inclusion probabilities.

References

Hartley, H.O.; Rao, J.N.K., 1962. Sampling With Unequal Probability and Without Replacement. The Annals of Mathematical Statistics 33 (2), 350-374.

Hájek, J., 1964. Asymptotic Theory of Rejective Sampling with Varying Probabilities from a Finite Population. The Annals of Mathematical Statistics 35 (4), 1491-1523.

Tillé, Y., 1996. Some Remarks on Unequal Probability Sampling Designs Without Replacement. Annals of Economics and Statistics 44, 177-189.

Brewer, K.R.W.; Donadio, M.E., 2003. The High Entropy Variance of the Horvitz-Thompson Estimator. Survey Methodology 29 (2), 189-196.

Examples


### Generate population data ---
N <- 20; n<-5

set.seed(0)
x <- rgamma(N, scale=10, shape=5)
y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) )

pik  <- n * x/sum(x)

### Approximate joint-inclusion probabilities ---
pikl <- jip_approx(pik, method='Hajek')
pikl <- jip_approx(pik, method='HartleyRao')
pikl <- jip_approx(pik, method='Tille')
pikl <- jip_approx(pik, method='Brewer1')
pikl <- jip_approx(pik, method='Brewer2')
pikl <- jip_approx(pik, method='Brewer3')
pikl <- jip_approx(pik, method='Brewer4')




rhobis/jipApprox documentation built on Sept. 12, 2023, 7:01 a.m.