sample_strata: Select Sampling Units based on Stratified Random Sampling

View source: R/sample_strata.R

sample_strataR Documentation

Select Sampling Units based on Stratified Random Sampling

Description

Requires two dataframes or matrices: data with a column strata which specifies stratum membership for each unit in the population and a second dataframe design_data with one row per strata level with a column design_strata that indicates the unique levels of strata in data and n_allocated that specifies the number to be sampled from each stratum. sample_strata selects the units to sample by selecting a random sample of the desired size within each stratum. The second dataframe can be the output of allocate_wave() or optimum_allocation().

Usage

sample_strata(
  data,
  strata,
  id,
  already_sampled = NULL,
  design_data,
  design_strata = "strata",
  n_allocated = "n_to_sample",
  probs = NULL,
  wave = NULL,
  warn_prob_overwrite = TRUE
)

Arguments

data

A data frame or matrix with one row for each sampling unit in the population, one column specifying each unit's stratum, and one column with a unique identifier for each unit.

strata

a character string specifying the name of column in data which indicates stratum membership.

id

a character string specifying the name of the column in data that uniquely identifies each unit.

already_sampled

a character sting specifying the name of the column in data which indicates (1/0 or Y/N) whether a unit has already been sampled in a prior wave. Defaults to NULL which means that none have been sampled yet.

design_data

a dataframe or matrix with one row for each stratum that subdivides the population, one column specifying the stratum name, and one column indicating the number of samples allocated to each stratum.

design_strata

a character string specifying the name of the column in design_data that contains the stratum levels. Defaults to "strata".

n_allocated

a character string specifying the name of the column in design_data that indicates the n allocated to each stratum. Defaults to "n_to_sample".

probs

a character string specifying the name of the column in in design_data that indicates the sampling probability for each stratum, or a formula indicating how the sampling probabilities can be computed. From existing columns. If specified, a new column containing the sampling probability attached to each sampled unit will be created in the outputted dataframe. This column will be named "sampling_prob". Defaults to NULL.

wave

A numeric value or character string indicating the sampling wave. If specified, the input is appended to "sample_indicator" in the new the sample indicator column name (as long as such columns name do not already exist in data). Defaults to NULL. This argument does not apply when sample_strata() is called inside allocate_wave().

warn_prob_overwrite

Logical indicator for whether warning should be printed if probs is specified and a "sampling_prob" columns is going to be overwritten. Defaults to TRUE. If function is called inside apply_multiwave(), then defaults to FALSE

Value

returns data as a dataframe with a new column named "sample_indicator" containing a binary (1/0) indicator of whether each unit should be sampled. If wave argument is specified, then the given input is appended to the name "sample_indicator". If probs argument is specified, then the dataframe will also contain a new column named "sampling_prob" holding the sampling probabilities for each sampled element.

Examples

# Define a design dataframe
design <- data.frame(
  strata = c("setosa", "virginica", "versicolor"),
  npop = c(50, 50, 50),
  n_to_sample = c(5, 5, 5)
)

# Make sure there is an id column
iris$id <- 1:nrow(iris)

# Run
sample_strata(
  data = iris, strata = "Species", id = "id",
  design_data = design, design_strata = "strata",
  n_allocated = "n_to_sample"
)

# To include probs as a formula
sample_strata(
  data = iris, strata = "Species", id = "id",
  design_data = design, design_strata = "strata",
  n_allocated = "n_to_sample", probs = ~n_to_sample/npop
)

# If some units had already been sampled
iris$already_sampled <- rbinom(nrow(iris), 1, 0.25)

sample_strata(
  data = iris, strata = "Species", id = "id",
  already_sampled = "already_sampled",
  design_data = design, design_strata = "strata",
  n_allocated = "n_to_sample"
)

optimall documentation built on June 22, 2024, 9:34 a.m.