The ma_projection() function implements a model-assisted projection estimator for combining information from two independent surveys. This method is especially useful in survey sampling scenarios where:
This vignette illustrates how to use ma_projection() for domain-level estimation using various supervised learning models, including machine learning techniques via the parsnip interface.
The approach follows the work of Kim & Rao (2012), where a working model is trained on Survey 2 to predict the outcome variable. Predictions are made for the auxiliary-only Survey 1 data. These predictions are then aggregated by domain to generate small area estimates.
library(sae.projection) library(dplyr) library(tidymodels) library(bonsai) # for modern tree-based models
# Filter non-missing values for income svy22_income <- df_svy22 %>% filter(!is.na(income)) svy23_income <- df_svy23 %>% filter(!is.na(income)) # Fit projection model lm_result <- ma_projection( income ~ age + sex + edu + disability, cluster_ids = "PSU", weight = "WEIGHT", strata = "STRATA", domain = c("PROV", "REGENCY"), working_model = linear_reg(), data_model = svy22_income, data_proj = svy23_income, nest = TRUE ) # View results head(lm_result$df_result)
# Filter youth population for NEET classification svy22_neet <- df_svy22 %>% filter(between(age, 15, 24)) svy23_neet <- df_svy23 %>% filter(between(age, 15, 24)) # Fit logistic regression model lr_result <- ma_projection( formula = neet ~ sex + edu + disability, cluster_ids = ~ PSU, weight = ~ WEIGHT, strata = ~ STRATA, domain = ~ PROV + REGENCY, working_model = logistic_reg(), data_model = svy22_neet, data_proj = svy23_neet, nest = TRUE ) # View results head(lr_result$df_result)
# Define LightGBM model with tuning lgbm_model <- boost_tree( mtry = tune(), trees = tune(), min_n = tune(), tree_depth = tune(), learn_rate = tune(), engine = "lightgbm" ) # Fit with cross-validation lgbm_result <- ma_projection( formula = neet ~ sex + edu + disability, cluster_ids = "PSU", weight = "WEIGHT", strata = "STRATA", domain = c("PROV", "REGENCY"), working_model = lgbm_model, data_model = svy22_neet, data_proj = svy23_neet, cv_folds = 3, tuning_grid = 5, nest = TRUE ) # View results head(lgbm_result$df_result)
ma_projection() supports many working models using the parsnip interface, including:
linear_reg(), logistic_reg() (also with Stan engine)poisson_reg(), mlp(), naive_bayes(), nearest_neighbor()decision_tree(), bag_tree(), boost_tree() with LightGBM/XGBoost, rand_forest() (ranger, aorsf), bart()svm_linear(), svm_poly(), svm_rbf()Kim, J. K., & Rao, J. N. (2012). Combining data from two independent surveys: a model-assisted approach. Biometrika, 99(1), 85–100. doi:10.1093/biomet/asr063
ma_projection() provides a flexible and robust way to combine survey data using modern modeling tools. It supports a wide range of use cases including socioeconomic indicators, health estimates, and more.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.