Var_approx: Approximate the Variance of the Horvitz-Thompson estimator

View source: R/Variance_approximations.R

Var_approxR Documentation

Approximate the Variance of the Horvitz-Thompson estimator

Description

Approximations of the Horvitz-Thompson variance for High-Entropy sampling designs. Such methods use only first-order inclusion probabilities.

Usage

Var_approx(y, pik, n, method, ...)

Arguments

y

numeric vector containing the values of the variable of interest for all population units

pik

numeric vector of first-order inclusion probabilities, of length equal to population size

n

a scalar indicating the sample size

method

string indicating the approximation that should be used. One of "Hajek1", "Hajek2", "HartleyRao1", "HartleyRao2", "FixedPoint".

...

two optional parameters can be modified to control the iterative procedure in method="FixedPoint": maxIter sets the maximum number of iterations and eps controls the convergence error

Details

The variance approximations available in this function are described below, the notation used is that of Matei and Tillé (2005).

  • Hájek variance approximation (method="Hajek1"):

    \tilde{Var} = \sum_{i \in U} \frac{b_i}{\pi_i^2}(y_i - y_i^*)^2

    where

    y_i^* = \pi_i \frac{ \sum_{j\in U} b_j y_j/\pi_j }{ \sum_{j \in U} b_j }

    and

    b_i = \frac{ \pi_i(1-\pi_i)N }{ N-1 }

  • Starting from Hajék (1964), Brewer (2002) defined the following estimator (method="Hajek2"):

    \tilde{Var} = \sum_{i \in U} \pi_i(1-\pi_i) \Bigl( \frac{y_i}{\pi_i} - \frac{\tilde{Y}}{n} \Bigr)^2

    where \tilde{Y} = \sum_{i \in U} a_i y_i and a_i = n(1-\pi_i)/\sum_{j \in U} \pi_j(1-\pi_j)

  • Hartley and Rao (1962) variance approximation (method="HartleyRao1"):

    \tilde{Var} = \sum_{i \in U} \pi_i \Bigl( 1 - \frac{n-1}{n}\pi_i \Bigr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2

    \qquad - \frac{n-1}{n^2} \sum_{i \in U} \Biggl( 2\pi_i^3 - \frac{\pi_i^2}{2} \sum_{j \in U} \pi_j^2 \Biggr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2

    \quad \qquad + \frac{2(n-1)}{n^3} \Biggl( \sum_{i \in U}\pi_i y_i - \frac{Y}{n}\sum_{i\in U} \pi_i^2 \Biggr)^2

  • Hartley and Rao (1962) provide a simplified version of the variance above (method="HartleyRao2"):

    \tilde{Var} = \sum_{i \in U} \pi_i \Bigl( 1 - \frac{n-1}{n}\pi_i \Bigr) \Biggr( \frac{y_i}{\pi_i} - \frac{Y}{n} \Biggr)^2

  • method="FixedPoint" computes the Fixed-Point variance approximation proposed by Deville and Tillé (2005). The variance can be expressed in the same form as in method="Hajek1", and the coefficients b_i are computed iteratively by the algorithm:

    1. b_i^{(0)} = \pi_i (1-\pi_i) \frac{N}{N-1}, \,\, \forall i \in U

    2. b_i^{(k)} = \frac{(b_i^{(k-1)})^2 }{\sum_{j\in U} b_j^{(k-1)} } + \pi_i(1-\pi_i)

    a necessary condition for convergence is checked and, if not satisfied, the function returns an alternative solution that uses only one iteration:

    b_i = \pi_i(1-\pi_i)\Biggl( \frac{N\pi_i(1-\pi_i)}{ (N-1)\sum_{j\in U}\pi_j(1-\pi_j) } + 1 \Biggr)

Value

a scalar, the approximated variance.

References

Matei, A.; Tillé, Y., 2005. Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. Journal of Official Statistics 21 (4), 543-570.

Examples


N <- 500; n <- 50

set.seed(0)
x <- rgamma(n=N, scale=10, shape=5)
y <- abs( 2*x + 3.7*sqrt(x) * rnorm(N) )

pik  <- n * x/sum(x)
pikl <- outer(pik, pik, '*'); diag(pikl) <- pik

### Variance approximations ---
Var_approx(y, pik, n, method = "Hajek1")
Var_approx(y, pik, n, method = "Hajek2")
Var_approx(y, pik, n, method = "HartleyRao1")
Var_approx(y, pik, n, method = "HartleyRao2")
Var_approx(y, pik, n, method = "FixedPoint")



UPSvarApprox documentation built on Aug. 27, 2023, 9:06 a.m.