View source: R/estimate_mean_statistics.R
estimate_mean_stats | R Documentation |
Computes survey-weighted means and standard errors for one or more variables in a survey design, optionally stratified by domains (e.g., regions, groups). If population means are provided, it also calculates the bias and mean squared error (MSE) of the estimates compared to the population means.
estimate_mean_stats(design, vars, by = NULL, population_means = NULL)
design |
A survey design object (e.g., from the survey package). The object must contain survey data and weights. |
vars |
A character vector of variable names to compute means for. These variables must
exist in |
by |
Optional. A one-sided formula (e.g., |
population_means |
Optional. A named numeric vector with population means (for |
Observations with non-finite weights are excluded from the analysis globally.
For each variable, observations with non-finite values are also dropped (in addition to the global weight filter). An error will occur if no valid data remains.
For domain-specific means, population_means
must include all domain columns and the variable
being estimated. The rows are merged by domain key.
A named list where each component is a data frame with the following columns:
Columns for the domain variables (if by
is specified).
The name of the variable.
The survey-weighted mean of the variable.
The standard error of the survey-weighted mean.
The difference between the survey-weighted mean and the population mean. NA
if
population_means
is not provided.
The mean squared error (MSE) of the survey-weighted mean. NA
if population_means
is not provided.
## ============================================================
## Example 1: Overall means with population means (bias/MSE)
## ============================================================
if (requireNamespace("survey", quietly = TRUE)) {
set.seed(123)
options(survey.lonely.psu = "adjust")
n <- 200
region <- factor(sample(c("N", "S"), n, replace = TRUE))
sex <- factor(sample(c("F", "M"), n, replace = TRUE))
y <- 10 + 2 * (sex == "M") + rnorm(n, sd = 1.5)
z <- 100 + 5 * (region == "S") + rnorm(n, sd = 3)
w <- runif(n, 0.8, 1.8) * 50
df <- data.frame(region, sex, y, z, w)
des <- survey::svydesign(ids = ~1, weights = ~w, data = df)
## Named vector of population means for overall case
pop_means <- c(y = 11.2, z = 103.0)
res_overall <- estimate_mean_stats(
design = des,
vars = c("y", "z"),
by = NULL,
population_means = pop_means
)
## Each element is a one-row data frame
res_overall$y
res_overall$z
}
## ============================================================
## Example 2: Domain means by region with population means table
## ============================================================
if (requireNamespace("survey", quietly = TRUE)) {
set.seed(456)
options(survey.lonely.psu = "adjust")
n <- 150
region <- factor(sample(c("N", "S"), n, replace = TRUE), levels = c("N", "S"))
sex <- factor(sample(c("F", "M"), n, replace = TRUE))
y <- 12 + 1.5 * (region == "S") + rnorm(n, sd = 1.2)
z <- 95 + 6 * (region == "S") + rnorm(n, sd = 2.5)
w <- runif(n, 0.7, 2.0) * 40
toy <- data.frame(region, sex, y, z, w)
des2 <- survey::svydesign(ids = ~1, weights = ~w, data = toy)
## Population means by domain must include the domain column(s) + vars
pop_by_region <- data.frame(
region = c("N", "S"),
y = c(12.2, 13.8),
z = c(95.5, 101.0),
stringsAsFactors = FALSE
)
res_by <- estimate_mean_stats(
design = des2,
vars = c("y", "z"),
by = ~region, # or equivalently: by = "region"
population_means = pop_by_region # merged by the 'region' key
)
## Each element is a data frame with domain rows
res_by$y
res_by$z
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.