knitr::opts_chunk$set(
  collapse = TRUE,
  cache = TRUE,
  comment = "#>",
  fig.path = "README-"
)

MultCal2Sim

The MultCal2Sim package for R has tools to perform calibration with pre-computed estimates. MultCal2Sim will calibrate initial estimates from the first data source onto estimated reference control-totals from the second data source. There are two choices of computation tools available, post-stratification and raking. For methodological details see [@writeupmcs].

Installation

You can install MultCal2Sim from github with:

# install.packages('devtools')
# library(devtools)
devtools::install_github("statsccpr/MultCal2Sim")

Examples

We use the survey::api dataset that contains infomation about student performance in California Schools.

library(survey)
data(api,package = 'survey')


form_outcome = ~enroll
form_poststrat = ~sch.wide+both

We want an improved estimate of the total for the outcome enroll where sch.wide and both are post-strata variables.

The to-be-calibrated outcome is found in Data Source 1, apiclus1, which is a cluster sample of school districts. Acting as the target data source is Data Source 2, apistrat, which is a sample design (pre) stratified by type of school (Elementary/Middle/High School),

# stratified sample via ?apistrat
des_targ  = svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)

# # one-stage cluster sample via ?apiclus1
des_2be_cal = svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

Poststratification

Step 0: Computing estimates for strata cell totals

If the user already has estimates for the post-strata cells, the user can skip to step 1. Otherwise, we have helper functions for the user to compute the required post-strata cell estimates.

est_tot_from_des_joint(form_additive,design_refer): estimate joint totals from a survey design object est_tot_from_des_1marg(form_1_term_only,design_refer): estimate marginal totals for a single variable from a survey design object

We'll first focus on the MCS poststratification method, thus requiring joint control totals. Later we'll demonstrate how marginal totals are used for the MCS raking method.

# helper function to compute estimated totals within joint strata
library(MultCal2Sim)
df_targ_tot_joint = est_tot_from_des_joint(form_additive=form_poststrat,
                                            design_refer=des_targ)

Going forward, we need the following objects

df_targ_tot_joint  # data.frame with estimates of the post-strata cell totals
form_poststrat  # formula specifying poststrata
form_outcome   # formula specifying the outcome
des_2be_cal  # survey.design object containing the outcome to be calibrated

Step 1: Simulating draws using the estimated strata control totals from the target data source

Simulate $m=1,2,3,...,M$ draws for each strata cell $s=1,2,3,...,S$ in the first step. The ?sim_tot_from_est() function will simulate totals from pre-existing estimates. It will return one simulation of either joint totals or marginal totals in a data frame.

# simulate one draw
sim_tot_from_est(df_or_list_est_tot=df_targ_tot_joint,type_strata='joint',lgl_rej_neg_sim=TRUE)

# simulate 10 draws for joint totals
list_sim_out_joint = lapply(1:10,FUN=sim_tot_from_est,df_or_list_est_tot=df_targ_tot_joint,type_strata='joint')

# list of 10 draws
str(list_sim_out_joint,1)

# first draw
list_sim_out_joint[[1]]

Step 2: Calibrating the estimates from the first data source onto the simulated control totals

The user can pick either post-stratification or raking to calibrate the initial estimate to each draw (simulated in step 1). The function ?cal_2_sim() will calibrate (poststratify or rake) an outcome onto simulted poststrata totals. It will return a list of multiple intermediate estimates. Here we use post-stratification.

# using mcsp to calibrate the estimate to the first draw
cal_2_sim(des_2be_cal=des_2be_cal,
         df_or_list_sim=list_sim_out_joint[[1]],
         form_outcome=form_outcome,form_poststrat=form_poststrat,
         type_cal='mcsp')

# repeat the task 10 times via ?lapply()
list_cal_out_joint = lapply(list_sim_out_joint,FUN=cal_2_sim,
                      des_2be_cal=des_2be_cal,
                      form_outcome=form_outcome,
                      form_poststrat=form_poststrat,
                      type_cal='mcsp')

str(list_cal_out_joint)

Step 3: Computing the final point estimate and the standard error

The ?combine_est() function will combine the list of intermediate estimates (from step 2). The final point estimate is the average of intermediate calibrations, and its standard error will be returned as well.

# combine estimates
combine_est(list_cal_out_joint)

Raking onto marginal totals (MCSR)

Alternative to poststratification onto joint totals, we demonstrate raking onto marginal totals. For raking, the control variable specification requires a list where each element is a one term formula (unlike poststratification requiring one formula of terms).

# marginal targets
args(est_tot_from_des_1marg)
form_poststrat

# expect error if more than 1 RHS term in formula
form_poststrat

# est_tot_from_des_1marg(form_1_term_only=~sch.wide + both,design_refer=des_targ)

is.list(form_poststrat)

# correct, list of one term formulae
list_form_marg = list(~sch.wide,~both)
is.list(list_form_marg)

list_targ_tot_marg = lapply(list_form_marg,FUN=est_tot_from_des_1marg, design_refer=des_targ)

args(sim_tot_from_est)

# step 1, simulate
# simulate one draw of margin(s)
sim_tot_from_est(df_or_list_est_tot=list_targ_tot_marg,type_strata='marginal',lgl_rej_neg_sim=TRUE)

# simulate 10 draws
list_sim_out_marg = lapply(1:10,FUN=sim_tot_from_est,df_or_list_est_tot=list_targ_tot_marg,type_strata='marginal')

# list of draws
str(list_sim_out_marg,2)

# 5th draw
list_sim_out_marg[[5]]

# step 2, calibrate via mcsr to compute intermediate estimates
list_cal_out_marg = lapply(list_sim_out_marg,FUN=cal_2_sim,
                            des_2be_cal=des_2be_cal,
                            form_outcome=form_outcome,
                            form_poststrat=list_form_marg,
                            type_cal='mcsr')

# step 3: combine the estimates and compute final uncertainty
combine_est(list_cal_out_marg)

Overall MCS helper that wraps together steps 1-3

For the data arguments, the ?mcs() wrapper needs a survey object with the outcome to be calibrated, the precomputed estimates of the control totals, a formula of the outcome, and a formula of the poststrata controls.

mcs(des_2be_cal=des_2be_cal,
    df_or_list_est_tot=list_targ_tot_marg,
    form_outcome=form_outcome,
    form_poststrat=list_form_marg,
    type_cal='mcsr',
    num_sim=50,
    lgl_rej_neg_sim=TRUE)

The ?mcs() wrapper has a parallel option

# parallel
mcs(des_2be_cal=des_2be_cal,
    df_or_list_est_tot=list_targ_tot_marg,
    form_outcome=form_outcome,
    form_poststrat=list_form_marg,
    type_cal='mcsr',
    num_sim=50,
    parallel = TRUE,
    num_core = 4,
    lgl_rej_neg_sim=TRUE)

# mcs(des_2be_cal=des_2be_cal,
#     df_or_list_est_tot=df_targ_tot_joint,
#     form_outcome=form_outcome,
#     form_poststrat=form_poststrat,
#     type_cal='mcsp',
#     num_sim=200,
#     parallel = TRUE,
#     num_core = 4,
#     lgl_rej_neg_sim=TRUE)
sessionInfo()


statsccpr/MultCal2Sim documentation built on Sept. 3, 2022, 12:39 p.m.