estimate_mean_stats: Estimate Survey-Weighted Means (Overall or By Domain)

View source: R/estimate_mean_statistics.R

estimate_mean_statsR Documentation

Estimate Survey-Weighted Means (Overall or By Domain)

Description

Computes survey-weighted means and standard errors for one or more variables in a survey design, optionally stratified by domains (e.g., regions, groups). If population means are provided, it also calculates the bias and mean squared error (MSE) of the estimates compared to the population means.

Usage

estimate_mean_stats(design, vars, by = NULL, population_means = NULL)

Arguments

design

A survey design object (e.g., from the survey package). The object must contain survey data and weights.

vars

A character vector of variable names to compute means for. These variables must exist in design$variables.

by

Optional. A one-sided formula (e.g., ~region) or a character vector of domain variables for stratified analysis. If NULL, computes overall means across all data.

population_means

Optional. A named numeric vector with population means (for by = NULL) or a data frame containing domain variables and population means (for by not NULL). Used to calculate bias and MSE for each variable.

Details

  • Observations with non-finite weights are excluded from the analysis globally.

  • For each variable, observations with non-finite values are also dropped (in addition to the global weight filter). An error will occur if no valid data remains.

  • For domain-specific means, population_means must include all domain columns and the variable being estimated. The rows are merged by domain key.

Value

A named list where each component is a data frame with the following columns:

domain columns

Columns for the domain variables (if by is specified).

variable

The name of the variable.

mean

The survey-weighted mean of the variable.

se

The standard error of the survey-weighted mean.

bias

The difference between the survey-weighted mean and the population mean. NA if population_means is not provided.

mse

The mean squared error (MSE) of the survey-weighted mean. NA if population_means is not provided.

Examples

## ============================================================
## Example 1: Overall means with population means (bias/MSE)
## ============================================================
if (requireNamespace("survey", quietly = TRUE)) {
  set.seed(123)
  options(survey.lonely.psu = "adjust")

  n <- 200
  region <- factor(sample(c("N", "S"), n, replace = TRUE))
  sex <- factor(sample(c("F", "M"), n, replace = TRUE))
  y <- 10 + 2 * (sex == "M") + rnorm(n, sd = 1.5)
  z <- 100 + 5 * (region == "S") + rnorm(n, sd = 3)
  w <- runif(n, 0.8, 1.8) * 50
  df <- data.frame(region, sex, y, z, w)

  des <- survey::svydesign(ids = ~1, weights = ~w, data = df)

  ## Named vector of population means for overall case
  pop_means <- c(y = 11.2, z = 103.0)

  res_overall <- estimate_mean_stats(
    design = des,
    vars = c("y", "z"),
    by = NULL,
    population_means = pop_means
  )

  ## Each element is a one-row data frame
  res_overall$y
  res_overall$z
}

## ============================================================
## Example 2: Domain means by region with population means table
## ============================================================
if (requireNamespace("survey", quietly = TRUE)) {
  set.seed(456)
  options(survey.lonely.psu = "adjust")

  n <- 150
  region <- factor(sample(c("N", "S"), n, replace = TRUE), levels = c("N", "S"))
  sex <- factor(sample(c("F", "M"), n, replace = TRUE))
  y <- 12 + 1.5 * (region == "S") + rnorm(n, sd = 1.2)
  z <- 95 + 6 * (region == "S") + rnorm(n, sd = 2.5)
  w <- runif(n, 0.7, 2.0) * 40
  toy <- data.frame(region, sex, y, z, w)

  des2 <- survey::svydesign(ids = ~1, weights = ~w, data = toy)

  ## Population means by domain must include the domain column(s) + vars
  pop_by_region <- data.frame(
    region = c("N", "S"),
    y = c(12.2, 13.8),
    z = c(95.5, 101.0),
    stringsAsFactors = FALSE
  )

  res_by <- estimate_mean_stats(
    design = des2,
    vars = c("y", "z"),
    by = ~region, # or equivalently: by = "region"
    population_means = pop_by_region # merged by the 'region' key
  )

  ## Each element is a data frame with domain rows
  res_by$y
  res_by$z
}


auxvecLASSO documentation built on Aug. 28, 2025, 9:09 a.m.