knitr::opts_chunk$set( collapse = TRUE, cache = TRUE, comment = "#>", fig.path = "README-" )
The MultCal2Sim
package for R has tools to perform calibration with pre-computed estimates. MultCal2Sim will calibrate initial estimates from the first data source onto estimated reference control-totals from the second data source. There are two choices of computation tools available, post-stratification and raking. For methodological details see [@writeupmcs].
You can install MultCal2Sim from github with:
# install.packages('devtools') # library(devtools) devtools::install_github("statsccpr/MultCal2Sim")
We use the survey::api
dataset that contains infomation about student performance in California Schools.
library(survey) data(api,package = 'survey') form_outcome = ~enroll form_poststrat = ~sch.wide+both
We want an improved estimate of the total for the outcome enroll
where sch.wide
and both
are post-strata variables.
The to-be-calibrated outcome is found in Data Source 1, apiclus1
, which is a cluster sample of school districts. Acting as the target data source is Data Source 2, apistrat
, which is a sample design (pre) stratified by type of school (Elementary/Middle/High School),
# stratified sample via ?apistrat des_targ = svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc) # # one-stage cluster sample via ?apiclus1 des_2be_cal = svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
If the user already has estimates for the post-strata cells, the user can skip to step 1. Otherwise, we have helper functions for the user to compute the required post-strata cell estimates.
est_tot_from_des_joint(form_additive,design_refer)
: estimate joint totals from a survey design object
est_tot_from_des_1marg(form_1_term_only,design_refer)
: estimate marginal totals for a single variable from a survey design object
We'll first focus on the MCS poststratification method, thus requiring joint control totals. Later we'll demonstrate how marginal totals are used for the MCS raking method.
# helper function to compute estimated totals within joint strata library(MultCal2Sim) df_targ_tot_joint = est_tot_from_des_joint(form_additive=form_poststrat, design_refer=des_targ)
Going forward, we need the following objects
df_targ_tot_joint # data.frame with estimates of the post-strata cell totals form_poststrat # formula specifying poststrata form_outcome # formula specifying the outcome des_2be_cal # survey.design object containing the outcome to be calibrated
Simulate $m=1,2,3,...,M$ draws for each strata cell $s=1,2,3,...,S$ in the first step. The ?sim_tot_from_est()
function will simulate totals from pre-existing estimates. It will return one simulation of either joint totals or marginal totals in a data frame.
# simulate one draw sim_tot_from_est(df_or_list_est_tot=df_targ_tot_joint,type_strata='joint',lgl_rej_neg_sim=TRUE) # simulate 10 draws for joint totals list_sim_out_joint = lapply(1:10,FUN=sim_tot_from_est,df_or_list_est_tot=df_targ_tot_joint,type_strata='joint') # list of 10 draws str(list_sim_out_joint,1) # first draw list_sim_out_joint[[1]]
The user can pick either post-stratification or raking to calibrate the initial estimate to each draw (simulated in step 1). The function ?cal_2_sim()
will calibrate (poststratify or rake) an outcome onto simulted poststrata totals. It will return a list of multiple intermediate estimates. Here we use post-stratification.
# using mcsp to calibrate the estimate to the first draw cal_2_sim(des_2be_cal=des_2be_cal, df_or_list_sim=list_sim_out_joint[[1]], form_outcome=form_outcome,form_poststrat=form_poststrat, type_cal='mcsp') # repeat the task 10 times via ?lapply() list_cal_out_joint = lapply(list_sim_out_joint,FUN=cal_2_sim, des_2be_cal=des_2be_cal, form_outcome=form_outcome, form_poststrat=form_poststrat, type_cal='mcsp') str(list_cal_out_joint)
The ?combine_est()
function will combine the list of intermediate estimates (from step 2). The final point estimate is the average of intermediate calibrations, and its standard error will be returned as well.
# combine estimates combine_est(list_cal_out_joint)
Alternative to poststratification onto joint totals, we demonstrate raking onto marginal totals. For raking, the control variable specification requires a list where each element is a one term formula (unlike poststratification requiring one formula of terms).
# marginal targets args(est_tot_from_des_1marg) form_poststrat # expect error if more than 1 RHS term in formula form_poststrat # est_tot_from_des_1marg(form_1_term_only=~sch.wide + both,design_refer=des_targ) is.list(form_poststrat) # correct, list of one term formulae list_form_marg = list(~sch.wide,~both) is.list(list_form_marg) list_targ_tot_marg = lapply(list_form_marg,FUN=est_tot_from_des_1marg, design_refer=des_targ) args(sim_tot_from_est) # step 1, simulate # simulate one draw of margin(s) sim_tot_from_est(df_or_list_est_tot=list_targ_tot_marg,type_strata='marginal',lgl_rej_neg_sim=TRUE) # simulate 10 draws list_sim_out_marg = lapply(1:10,FUN=sim_tot_from_est,df_or_list_est_tot=list_targ_tot_marg,type_strata='marginal') # list of draws str(list_sim_out_marg,2) # 5th draw list_sim_out_marg[[5]] # step 2, calibrate via mcsr to compute intermediate estimates list_cal_out_marg = lapply(list_sim_out_marg,FUN=cal_2_sim, des_2be_cal=des_2be_cal, form_outcome=form_outcome, form_poststrat=list_form_marg, type_cal='mcsr') # step 3: combine the estimates and compute final uncertainty combine_est(list_cal_out_marg)
For the data arguments, the ?mcs()
wrapper needs a survey object with the outcome to be calibrated, the precomputed estimates of the control totals, a formula of the outcome, and a formula of the poststrata controls.
mcs(des_2be_cal=des_2be_cal, df_or_list_est_tot=list_targ_tot_marg, form_outcome=form_outcome, form_poststrat=list_form_marg, type_cal='mcsr', num_sim=50, lgl_rej_neg_sim=TRUE)
The ?mcs()
wrapper has a parallel option
# parallel mcs(des_2be_cal=des_2be_cal, df_or_list_est_tot=list_targ_tot_marg, form_outcome=form_outcome, form_poststrat=list_form_marg, type_cal='mcsr', num_sim=50, parallel = TRUE, num_core = 4, lgl_rej_neg_sim=TRUE) # mcs(des_2be_cal=des_2be_cal, # df_or_list_est_tot=df_targ_tot_joint, # form_outcome=form_outcome, # form_poststrat=form_poststrat, # type_cal='mcsp', # num_sim=200, # parallel = TRUE, # num_core = 4, # lgl_rej_neg_sim=TRUE)
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.