generate_population_totals: Generate population totals for a calibration design matrix

View source: R/generate_population_totals.R

generate_population_totalsR Documentation

Generate population totals for a calibration design matrix

Description

Build a fixed model matrix on a population frame and return the column totals needed for calibration (optionally weighted). The function freezes dummy/interaction structure on the population by constructing a terms object, so downstream use on respondent data can reuse the exact same encoding.

Usage

generate_population_totals(
  population_df,
  calibration_formula,
  weights = NULL,
  contrasts = NULL,
  include_intercept = TRUE,
  sparse = FALSE,
  na_action = stats::na.pass,
  drop_zero_cols = FALSE
)

Arguments

population_df

A data frame containing the calibration population.

calibration_formula

A one-sided formula specifying main effects and interactions (e.g., ~ stype + api00_bin:stype). The intercept is handled by include_intercept.

weights

Optional numeric vector of population weights (length nrow(population_df)). If NULL (default), unweighted totals are computed.

contrasts

Optional named list of contrasts to pass to model.matrix() (e.g., list(stype = contr.treatment)). If NULL, the current global options(contrasts=...) are used.

include_intercept

Logical; if TRUE (default) keep the (Intercept) column in the totals (it will sum to sum(weights) or nrow(population_df) if unweighted).

sparse

Logical; if TRUE, return the population model matrix internally as a sparse Matrix while computing totals. (Totals are always returned as a base numeric vector.)

na_action

NA handling passed to model.frame(); defaults to stats::na.pass. Consider stats::na.omit for stricter behavior.

drop_zero_cols

Logical; if TRUE, drop columns whose population total is exactly zero. Default FALSE. A message is emitted if any zero-total columns are found.

Value

An object of class "calib_totals": a list with

  • population_totals: named numeric vector of column totals

  • levels: list of factor levels observed in the population (for reproducibility)

  • terms: the terms object built on population_df

  • contrasts: the contrasts actually used (from the model matrix)

Examples


# Example using the API data from the survey package
library(survey)
data(api) # loads apipop, apisrs, apistrat, etc.

# Build a population frame and create some binary fields used in a formula
pop <- apipop
pop$api00_bin <- as.factor(ifelse(pop$api00 >= 700, "700plus", "lt700"))
pop$growth_bin <- as.factor(ifelse(pop$growth >= 0, "nonneg", "neg"))
pop$ell_bin <- as.factor(ifelse(pop$ell >= 10, "highELL", "lowELL"))
pop$comp.imp_bin <- as.factor(ifelse(pop$comp.imp >= 50, "highComp", "lowComp"))
pop$hsg_bin <- as.factor(ifelse(pop$hsg >= 60, "highHSG", "lowHSG"))

# A calibration formula with main effects + a few interactions
cal_formula <- ~ stype + growth_bin + api00_bin + ell_bin + comp.imp_bin + hsg_bin +
  api00_bin:stype + hsg_bin:stype + comp.imp_bin:stype + api00_bin:growth_bin

# (Optional) frame weights if available; here we use unweighted totals
gp <- generate_population_totals(
  population_df        = pop,
  calibration_formula  = cal_formula,
  include_intercept    = TRUE
)

# Named totals ready for calibration:
head(gp$population_totals)

# If you later build a respondent model matrix, reuse gp$terms to ensure alignment:
# X_resp <- model.matrix(gp$terms, data = apisrs)
# stopifnot(identical(colnames(X_resp), names(gp$population_totals)))



auxvecLASSO documentation built on Aug. 28, 2025, 9:09 a.m.