estimate_mean_stats: Estimate Survey-Weighted Means (Overall or By Domain)
In auxvecLASSO: LASSO Auxiliary Variable Selection and Auxiliary Vector Diagnostics

View source: R/estimate_mean_statistics.R

estimate_mean_stats

R Documentation

Estimate Survey-Weighted Means (Overall or By Domain)

Description

Computes survey-weighted means and standard errors for one or more variables in a survey design, optionally stratified by domains (e.g., regions, groups). If population means are provided, it also calculates the bias and mean squared error (MSE) of the estimates compared to the population means.

Usage

estimate_mean_stats(design, vars, by = NULL, population_means = NULL)

Arguments

`design`	A survey design object (e.g., from the survey package). The object must contain survey data and weights.
`vars`	A character vector of variable names to compute means for. These variables must exist in `design$variables`.
`by`	Optional. A one-sided formula (e.g., `~region`) or a character vector of domain variables for stratified analysis. If `NULL`, computes overall means across all data.
`population_means`	Optional. A named numeric vector with population means (for `by = NULL`) or a data frame containing domain variables and population means (for `by` not `NULL`). Used to calculate bias and MSE for each variable.

Details

Observations with non-finite weights are excluded from the analysis globally.
For each variable, observations with non-finite values are also dropped (in addition to the global weight filter). An error will occur if no valid data remains.
For domain-specific means, population_means must include all domain columns and the variable being estimated. The rows are merged by domain key.

Value

A named list where each component is a data frame with the following columns:

domain columns: Columns for the domain variables (if by is specified).
variable: The name of the variable.
mean: The survey-weighted mean of the variable.
se: The standard error of the survey-weighted mean.
bias: The difference between the survey-weighted mean and the population mean. NA if population_means is not provided.
mse: The mean squared error (MSE) of the survey-weighted mean. NA if population_means is not provided.

Examples

## ============================================================
## Example 1: Overall means with population means (bias/MSE)
## ============================================================
if (requireNamespace("survey", quietly = TRUE)) {
  set.seed(123)
  options(survey.lonely.psu = "adjust")

  n <- 200
  region <- factor(sample(c("N", "S"), n, replace = TRUE))
  sex <- factor(sample(c("F", "M"), n, replace = TRUE))
  y <- 10 + 2 * (sex == "M") + rnorm(n, sd = 1.5)
  z <- 100 + 5 * (region == "S") + rnorm(n, sd = 3)
  w <- runif(n, 0.8, 1.8) * 50
  df <- data.frame(region, sex, y, z, w)

  des <- survey::svydesign(ids = ~1, weights = ~w, data = df)

  ## Named vector of population means for overall case
  pop_means <- c(y = 11.2, z = 103.0)

  res_overall <- estimate_mean_stats(
    design = des,
    vars = c("y", "z"),
    by = NULL,
    population_means = pop_means
  )

  ## Each element is a one-row data frame
  res_overall$y
  res_overall$z
}

## ============================================================
## Example 2: Domain means by region with population means table
## ============================================================
if (requireNamespace("survey", quietly = TRUE)) {
  set.seed(456)
  options(survey.lonely.psu = "adjust")

  n <- 150
  region <- factor(sample(c("N", "S"), n, replace = TRUE), levels = c("N", "S"))
  sex <- factor(sample(c("F", "M"), n, replace = TRUE))
  y <- 12 + 1.5 * (region == "S") + rnorm(n, sd = 1.2)
  z <- 95 + 6 * (region == "S") + rnorm(n, sd = 2.5)
  w <- runif(n, 0.7, 2.0) * 40
  toy <- data.frame(region, sex, y, z, w)

  des2 <- survey::svydesign(ids = ~1, weights = ~w, data = toy)

  ## Population means by domain must include the domain column(s) + vars
  pop_by_region <- data.frame(
    region = c("N", "S"),
    y = c(12.2, 13.8),
    z = c(95.5, 101.0),
    stringsAsFactors = FALSE
  )

  res_by <- estimate_mean_stats(
    design = des2,
    vars = c("y", "z"),
    by = ~region, # or equivalently: by = "region"
    population_means = pop_by_region # merged by the 'region' key
  )

  ## Each element is a data frame with domain rows
  res_by$y
  res_by$z
}

auxvecLASSO documentation built on Aug. 28, 2025, 9:09 a.m.