opt: Optimum Sample Allocation in Stratified Sampling
In wwojciech/stratallo: Optimum Sample Allocation in Stratified Sampling

View source: R/opt.R

opt	R Documentation

Optimum Sample Allocation in Stratified Sampling

Description

A classical problem in survey methodology in stratified sampling is optimum sample allocation. This problem is formulated as determination of strata sample sizes that minimize the variance of the stratified \pi estimator of the population total (or mean) of a given study variable, under certain constraints on sample sizes in strata.

The opt() user function solves the following optimum sample allocation problem, formulated below in the language of mathematical optimization.

Minimize

f(x_1,\ldots,x_H) = \sum_{h=1}^H \frac{A^2_h}{x_h}

subject to

\sum_{h=1}^H x_h = n

m_h \leq x_h \leq M_h, \quad h = 1,\ldots,H,

where n > 0,\, A_h > 0,\, m_h > 0,\, M_h > 0, such that m_h < M_h,\, h = 1,\ldots,H, and \sum_{h=1}^H m_h \leq n \leq \sum_{h=1}^H M_h, are given numbers. The minimization is on \mathbb R_+^H.

The inequality constraints are optional and user can choose whether and how they are to be added to the optimization problem. This is achieved by the proper use of m and M arguments of this function, according to the following rules:

no inequality constraints imposed: both m and M must be both set to NULL (default).
one-sided lower bounds m_h,\, h = 1,\ldots,H, imposed: lower bounds are specified with m, while M is set to NULL.
one-sided upper bounds M_h,\, h = 1,\ldots,H, imposed: upper bounds are specified with M, while m is set to NULL.
box-constraints imposed: lower and upper bounds must be specified with m and M, respectively.

Usage

opt(n, A, m = NULL, M = NULL, M_algorithm = "rna")

Arguments

`n`	(`number`) total sample size. A strictly positive scalar. If `bounds1` is not `NULL`, it is then required that `n >= sum(bounds1)` (given that `bounds1` are treated as lower bounds) or `n <= sum(bounds1)` (given that `bounds1` are treated as upper bounds). If `bounds2` is not `NULL`, it is then required that `n >= sum(bounds2)` (given that `bounds2` are treated as lower bounds) or `n <= sum(bounds2)` (given that `bounds2` are treated as upper bounds).
`A`	(`numeric`) population constants `A_1,\ldots,A_H`. Strictly positive numbers.
`m`	(`numeric` or `NULL`) lower bounds `m_1,\ldots,m_H`, optionally imposed on sample sizes in strata. If no lower bounds should be imposed, then `m` must be set to `NULL`. If `M` is not `NULL`, it is then required that `m < M`.
`M`	(`numeric` or `NULL`) upper bounds `M_1,\ldots,M_H`, optionally imposed on sample sizes in strata. If no upper bounds should be imposed, then `M` must be set to `NULL`. If `m` is not `NULL`, it is then required that `m < M`.
`M_algorithm`	(`string`) the name of the underlying algorithm to be used for computing sample allocation under one-sided upper-bounds constraints. It must be one of the following: `rna` (default), `sga`, `sgaplus`, `coma`. This parameter is used only in case when `m` argument is `NULL` and `M` is not `NULL` and number of strata `H > 1` and `n < sum(M)`.

Details

The opt() function makes use of several allocation algorithms, depending on which of the inequality constraints should be taken into account in the optimization problem. Each algorithm is implemented in a separate R function that in general should not be used directly by the end user. The following is the list with the algorithms that are used along with the name of the function that implements a given algorithm. See the description of a specific function to find out more about the corresponding algorithm.

one-sided lower-bounds m_h,\, h = 1,\ldots,H:
- LRNA - rna()
one-sided upper-bounds M_h,\, h = 1,\ldots,H:
- RNA - rna()
- SGA - sga()
- SGAPLUS - sgaplus()
- COMA - coma()
box constraints m_h, M_h,\, h = 1,\ldots,H:
- RNABOX - rnabox()

Value

Numeric vector with optimal sample allocations in strata.

Note

If no inequality constraints are added, the allocation is given by the Neyman allocation as:

x_h = A_h \frac{n}{\sum_{i=1}^H A_i}, \quad h = 1,\ldots,H.

For stratified \pi estimator of the population total with stratified simple random sampling without replacement design in use, the parameters of the objective function f are:

A_h = N_h S_h, \quad h = 1,\ldots,H,

where N_h is the size of stratum h and S_h denotes standard deviation of a given study variable in stratum h.

References

Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling, Springer, New York.

Examples

A <- c(3000, 4000, 5000, 2000)
m <- c(100, 90, 70, 50)
M <- c(300, 400, 200, 90)

# One-sided lower bounds.
opt(n = 340, A = A, m = m)
opt(n = 400, A = A, m = m)
opt(n = 700, A = A, m = m)

# One-sided upper bounds.
opt(n = 190, A = A, M = M)
opt(n = 700, A = A, M = M)

# Box-constraints.
opt(n = 340, A = A, m = m, M = M)
opt(n = 500, A = A, m = m, M = M)
xopt <- opt(n = 800, A = A, m = m, M = M)
xopt
var_st(x = xopt, A = A, A0 = 45000) # Value of the variance for allocation xopt.

# Execution-time comparisons of different algorithms with microbenchmark R package.
## Not run: 
N <- pop969[, "N"]
S <- pop969[, "S"]
A <- N * S
nfrac <- c(0.005, seq(0.05, 0.95, 0.05))
n <- setNames(as.integer(nfrac * sum(N)), nfrac)
lapply(
  n,
  function(ni) {
    microbenchmark::microbenchmark(
      RNA = opt(ni, A, M = N, M_algorithm = "rna"),
      SGA = opt(ni, A, M = N, M_algorithm = "sga"),
      SGAPLUS = opt(ni, A, M = N, M_algorithm = "sgaplus"),
      COMA = opt(ni, A, M = N, M_algorithm = "coma"),
      times = 200,
      unit = "us"
    )
  }
)

## End(Not run)

wwojciech/stratallo documentation built on June 2, 2025, 12:02 a.m.