optimum_allocation: Optimum Allocation

View source: R/optimum_allocation.R

optimum_allocationR Documentation

Optimum Allocation

Description

Determines the optimum sampling fraction and sample size for each stratum in a stratified random sample, which minimizes the variance of the sample mean according to Neyman Allocation or Exact Optimum Sample Allocation (Wright 2014).

Usage

optimum_allocation(
  data,
  strata,
  y = NULL,
  sd_h = NULL,
  N_h = NULL,
  nsample = NULL,
  ndigits = 2,
  method = c("WrightII", "WrightI", "Neyman"),
  allow.na = FALSE
)

Arguments

data

A data frame or matrix with at least one column specifying each unit's stratum, and either 1) a second column holding the value of the continuous variable for which the sample mean variance should be minimized (y) or 2) two columns: one holding the the within-stratum standard deviation for the variable of interest (sd_h) and another holding the stratum sample sizes (N_h). If data contains a column y holding values for the variable of interest, then data should have one row for each sampled unit. If data holds sd_h and N_h, the within-stratum standard deviations and population sizes, then data should have one row per stratum. Other columns are allowed but will be ignored.

strata

a character string or vector of character strings specifying the name(s) of columns which specify the stratum that each unit belongs to. If multiple column names are provided, each unique combination of values in these columns is taken to define one stratum.

y

a character string specifying the name of the continuous variable for which the variance should be minimized. Defaults to NULL and should be left as NULL when data holds stratum standard deviations and sample sizes instead of individual sampling units.

sd_h

a character string specifying the name of the column holding the within-stratum standard deviations for each stratum. Defaults to NULL and should be left as NULL when data holds individual sampling units.

N_h

a character string specifying the name of the column holding the population stratum sizes for each stratum. Defaults to NULL and should be left as NULL when data holds individual sampling units.

nsample

the desired total sample size. Defaults to NULL.

ndigits

a numeric value specifying the number of digits to which the standard deviation and stratum fraction should be rounded. Defaults to 2.

method

a character string specifying the method of optimum sample allocation to use. Must be one of:

  • "WrightII", the default, uses Algorithm II from Wright (2014) to determine the optimum allocation of a fixed sample size across the strata. It requires that at least two samples are allocated to each stratum.

  • "WrightI" uses Wright's Algorithm I to determine the optimum sample allocation. It only requires that at least one sample is allocated to each stratum, and can therefore lead to a biased variance estimate.

  • "Neyman" uses the standard method of Neyman Allocation to determine the optimum sample allocation. When nsample = NULL, the optimal sampling fraction is calculated and returned. When a numeric value is specified for nsample, then the number allocated to each stratum is the optimal sampling fraction times nsample rounded to the nearest integer, which may no longer be optimall.

allow.na

logical input specifying whether y should be allowed to have NA values. Defaults to FALSE.

Value

Returns a data frame with the specified total sample size, nsample, allocated across strata or the sampling fractions if nsample is NULL.

References

Wright, T. (2014). A Simple Method of Exact Optimal Sample Allocation under Stratification with any Mixed Constraint Patterns, Research Report Series (Statistics #2014-07), Center for Statistical Research and Methodology, U.S. Bureau of the Census, Washington, D.C.

Examples

optimum_allocation(
  data = iris, strata = "Species", y = "Sepal.Length",
  nsample = 40, method = "WrightII"
)

# Or if input data is summary of strata sd and N:
iris_summary <- data.frame(
  strata = unique(iris$Species),
  size = c(50, 50, 50),
  sd = c(0.3791, 0.3138, 0.3225)
)

optimum_allocation(
  data = iris_summary, strata = "strata",
  sd_h = "sd", N_h = "size",
  nsample = 40, method = "WrightII"
)

optimall documentation built on Sept. 8, 2023, 6:07 p.m.