generate_population_totals: Generate population totals for a calibration design matrix
In auxvecLASSO: LASSO Auxiliary Variable Selection and Auxiliary Vector Diagnostics

View source: R/generate_population_totals.R

generate_population_totals

R Documentation

Generate population totals for a calibration design matrix

Description

Build a fixed model matrix on a population frame and return the column totals needed for calibration (optionally weighted). The function freezes dummy/interaction structure on the population by constructing a terms object, so downstream use on respondent data can reuse the exact same encoding.

Usage

generate_population_totals(
  population_df,
  calibration_formula,
  weights = NULL,
  contrasts = NULL,
  include_intercept = TRUE,
  sparse = FALSE,
  na_action = stats::na.pass,
  drop_zero_cols = FALSE
)

Arguments

`population_df`	A data frame containing the calibration population.
`calibration_formula`	A one-sided formula specifying main effects and interactions (e.g., `~ stype + api00_bin:stype`). The intercept is handled by `include_intercept`.
`weights`	Optional numeric vector of population weights (length `nrow(population_df)`). If `NULL` (default), unweighted totals are computed.
`contrasts`	Optional named list of contrasts to pass to `model.matrix()` (e.g., `list(stype = contr.treatment)`). If `NULL`, the current global `options(contrasts=...)` are used.
`include_intercept`	Logical; if `TRUE` (default) keep the `(Intercept)` column in the totals (it will sum to `sum(weights)` or `nrow(population_df)` if unweighted).
`sparse`	Logical; if `TRUE`, return the population model matrix internally as a sparse Matrix while computing totals. (Totals are always returned as a base numeric vector.)
`na_action`	NA handling passed to `model.frame()`; defaults to `stats::na.pass`. Consider `stats::na.omit` for stricter behavior.
`drop_zero_cols`	Logical; if `TRUE`, drop columns whose population total is exactly zero. Default `FALSE`. A message is emitted if any zero-total columns are found.

Value

An object of class "calib_totals": a list with

population_totals: named numeric vector of column totals
levels: list of factor levels observed in the population (for reproducibility)
terms: the terms object built on population_df
contrasts: the contrasts actually used (from the model matrix)

Examples


# Example using the API data from the survey package
library(survey)
data(api) # loads apipop, apisrs, apistrat, etc.

# Build a population frame and create some binary fields used in a formula
pop <- apipop
pop$api00_bin <- as.factor(ifelse(pop$api00 >= 700, "700plus", "lt700"))
pop$growth_bin <- as.factor(ifelse(pop$growth >= 0, "nonneg", "neg"))
pop$ell_bin <- as.factor(ifelse(pop$ell >= 10, "highELL", "lowELL"))
pop$comp.imp_bin <- as.factor(ifelse(pop$comp.imp >= 50, "highComp", "lowComp"))
pop$hsg_bin <- as.factor(ifelse(pop$hsg >= 60, "highHSG", "lowHSG"))

# A calibration formula with main effects + a few interactions
cal_formula <- ~ stype + growth_bin + api00_bin + ell_bin + comp.imp_bin + hsg_bin +
  api00_bin:stype + hsg_bin:stype + comp.imp_bin:stype + api00_bin:growth_bin

# (Optional) frame weights if available; here we use unweighted totals
gp <- generate_population_totals(
  population_df        = pop,
  calibration_formula  = cal_formula,
  include_intercept    = TRUE
)

# Named totals ready for calibration:
head(gp$population_totals)

# If you later build a respondent model matrix, reuse gp$terms to ensure alignment:
# X_resp <- model.matrix(gp$terms, data = apisrs)
# stopifnot(identical(colnames(X_resp), names(gp$population_totals)))

auxvecLASSO documentation built on Aug. 28, 2025, 9:09 a.m.