allocate_wave: Adaptive Multi-Wave Sampling

View source: R/allocate_wave.R

allocate_waveR Documentation

Adaptive Multi-Wave Sampling

Description

Determines the adaptive optimum sampling allocation for a new sampling wave based on results from previous waves. Using Neyman or Wright (2014) allocation, allocate_wave calculates the optimum allocation for the total number of samples across waves, determines how many were allocated to each strata in previous waves, and allocates the remaining samples to make up the difference.

Usage

allocate_wave(
  data,
  strata,
  y,
  already_sampled,
  nsample,
  allocation_method = c("WrightII", "WrightI", "Neyman"),
  method = c("iterative", "simple"),
  detailed = FALSE
)

Arguments

data

A data frame or matrix with one row for each sampling unit, one column specifying each unit's stratum, one column holding the value of the continuous variable for which the variance should be minimized, and one column containing a binary indicator, already_sampled, specifying whether each unit has already been sampled.

strata

A character string or vector of character strings specifying the name of columns that indicate the stratum that each unit belongs to.

y

A character string specifying the name of the continuous variable for which the variance should be minimized.

already_sampled

A character string specifying the name of a column that contains a binary (Y/N or 1 /0) indicator specifying whether each unit has already been sampled in a previous wave.

nsample

The desired sample size of the next wave.

allocation_method

A character string specifying the method of optimum sample allocation to use. For details see optimum_allocation(). Defaults to WrightII which is more exact than Neyman but may run slower.

method

A character string specifying the method to be used if at least one group was oversampled. Must be one of:

  • "iterative", the default, will require a longer runtime but may be a more precise method of handling oversampled strata. If there are multiple oversampled strata, this method closes strata and re-calculates optimum allocation one by one.

  • "simple" closes all oversampled together and re-calculates optimum allocation on the rest of the strata only once. In certain cases where many strata have been oversampled in prior waves, it is possible that this method will output a negative value in n_to_sample. When this occurs, the function will print a warning, and it is recommended that the user re-runs the allocation with the 'iterative' method.

detailed

A logical value indicating whether the output dataframe should include details about each stratum including the true optimum allocation without the constraint of previous waves of sampling and stratum standard deviations. Defaults to FALSE. These details are all available from optimum_allocation().

Details

If the optimum sample size in a stratum is smaller than the amount it was allocated in previous waves, that strata has been oversampled. When oversampling occurs, allocate_wave "closes" the oversampled strata and re-allocates the remaining samples optimally among the open strata. Under these circumstances, the total sampling allocation is no longer optimal, but optimall will output the most optimal allocation possible for the next wave.

Value

Returns a dataframe with one row for each stratum and columns specifying the stratum name ("strata"), population stratum size ("npop"), cumulative sample in that strata ("nsample_actual"), prior number sampled in that strata ("nsample_prior"), and the optimally allocated number of units in each strata for the next wave ("n_to_sample").

References

McIsaac MA, Cook RJ. Adaptive sampling in two-phase designs: a biomarker study for progression in arthritis. Statistics in medicine. 2015 Sep 20;34(21):2899-912.

Reilly, M., & Pepe, M. S. (1995). A mean score method for missing and auxiliary covariate data in regression models. Biometrika, 82(2), 299-314.

Wright, T. (2014). A Simple Method of Exact Optimal Sample Allocation under Stratification with any Mixed Constraint Patterns, Research Report Series (Statistics #2014-07), Center for Statistical Research and Methodology, U.S. Bureau of the Census, Washington, D.C.

Examples

# Create dataframe with a column specifying strata, a variable of interest
# and an indicator for whether each unit was already sampled
set.seed(234)
mydata <- data.frame(Strata = c(rep(1, times = 20),
                                rep(2, times = 20),
                                rep(3, times = 20)),
                     Var = c(rnorm(20, 1, 0.5),
                             rnorm(20, 1, 0.9),
                             rnorm(20, 1.5, 0.9)),
                     AlreadySampled = rep(c(rep(1, times = 5),
                                            rep(0, times = 15)),
                                          times = 3))

x <- allocate_wave(
  data = mydata, strata = "Strata",
  y = "Var", already_sampled = "AlreadySampled",
  nsample = 20, method = "simple"
)

optimall documentation built on Sept. 8, 2023, 6:07 p.m.