cs_sampling | R Documentation |
cs_sampling
is a wrapper function. It takes in a svydesign
object and a stan_model
and inputs for sampling
.
It calls the sampling
to generate MCMC draws from the model. The constrained parameters are converted to unconstrained, adjusted, converted back to constrained and then output.
The adjustment process estimates a sandwich matrix adjustment to the posterior variance from two information matrices H and J.
J is estimated via resampling with withReplicates
. For each set of replicate weights, sampling
is called with no chains to instantiate a stanfit-class
object.
The stanfit-class
object has an associated method grad_log_prob
, which returns the gradient for a given input of unconstrained parameters.
The variance of this gradient is taken across the replicates and provides a estimate of the J matrix.
'cs_sampling
allows for different options for estimation of the Hessian matrix H.
By default H is estimated as the Monte Carlo mean via optimHess
using grad_log_prob
and each posterior draw as inputs. Using just the posterior mean is faster but less stable (previous default).
The asymptotic covariance for the posterior mean is then calculated as Hi V Hi, where Hi is the inverse of H.
The asymptotic covariance for the posterior sampling procedure (due to mis-specification) is Hi.
By default, cs_sampling
takes the "square" root of these matrices
via eigenvalue decomposition R1'R1 = Hi V Hi and R2'R2 = Hi. This is more stable but slower than using the Cholesky decomposition (previous default).
The final adjustment rescales/rotates the posterior sample by R2iR1 where R2i is the inverse of R2.
The final adjust can be interpreted as an asymptotic correction for model mis-specification due to using survey sampling weights as plug in values in the likelihood. This is often know as a "design effect" which is the "ratio" between the variance from simple random sample (Hi) and a complex survey sample (HiVHi).
cs_sampling( svydes, mod_stan, par_stan = NA, data_stan, ctrl_stan = list(chains = 1, iter = 2000, warmup = 1000, thin = 1), rep_design = FALSE, ctrl_rep = list(replicates = 100, type = "mrbbootstrap"), H_estimate = "MCMC", matrix_sqrt = "eigen", sampling_args = list() )
svydes |
|
mod_stan |
|
par_stan |
|
data_stan |
|
ctrl_stan |
|
rep_design |
|
ctrl_rep |
|
H_estimate |
|
matrix_sqrt |
|
sampling_args |
|
A list of the following:
stan_fit - the original stanfit-class
object returned by sampling
for the weighted model
sampled_parms - the array of parameters extracted from stan_fit corresponding to the parameter block in the stan model (specified by stan_pars)
adjusted_parms - the array of adjusted parameters, corresponding to sampled_parms which have been rescaled and rotated.
Matt Williams.
Williams, M. R., and Savitsky, T. D. (2020) Uncertainty Estimation for Pseudo-Bayesian Inference Under Complex Sampling. International Statistical Review, https://doi.org/10.1111/insr.12376.
# multinomial dependent variable library(survey) data(api) #make sure weights sum to n apiclus1$newpw <- apiclus1$pw/mean(apiclus1$pw) dclus1<-svydesign(id=~dnum, weights=~newpw, data=apiclus1, fpc=~fpc) M1 <- svymean(~stype, dclus1) #Use default replicate design rclus1<-as.svrepdesign(dclus1) M2 <- svymean(~stype, rclus1) #Construct a Stan model library(rstan) model_string <- csSampling::load_wt_multi_model() mod_dm <- stan_model(model_code = model_string) #Set the Data for Stan y <- as.factor(rclus1$variables$stype) yM <- model.matrix(~y -1) n <- dim(yM)[1] K <- dim(yM)[2] alpha<-rep(1,K) weights <- rclus1$pweights data_stan<-list("y"=yM,"alpha"=alpha,"K"=K, "n" = n, "weights" = weights) ctrl_stan<-list("chains"=1,"iter"=2000,"warmup"=1000,"thin"=1) #Run the Stan mode set.seed(12345) mod1 <- cs_sampling(svydes = rclus1, mod_stan = mod_dm, data_stan = data_stan, ctrl_stan = ctrl_stan, rep_design = TRUE) #Plot the results plot(mod1) plot(mod1, varnames = paste("theta",1:3, sep = "")) #Compare Summary Statistics #hybrid variance adjusted cbind(colMeans(mod1$adjusted_parms[,4:6]), sqrt(diag(cov(mod1$adjusted_parms[,4:6])))) #Taylor Linearization M1 #Replication M2 #Bernoulli-distributed dependent variable dat14 <- csSampling::dat14 #subset to adults dat14 <- dat14[as.numeric(dat14$CATAG6) > 1,] #normalize weights to sum to sample size dat14$WTS <- dat14$ANALWT_C/mean(dat14$ANALWT_C) svy14 <- svydesign(ids = ~VEREP, strata = ~VESTR, weights = ~WTS, data = dat14, nest = TRUE) #Construct a Stan model library(brms) model_formula <- formula("CIGMON|weights(WTS) ~ AMDEY2_U") stancode <- make_stancode(brmsformula(model_formula,center = FALSE), data = dat14, family = bernoulli(), save_model = "brms_wt_log.stan") mod_brms <- stan_model('brms_wt_log.stan') #Prepare data for Stan modelling data_brms <- make_standata(brmsformula(model_formula,center = FALSE), data = dat14, family = bernoulli()) #Run the Stan model set.seed(12345) #set seed to fix output for comparison mod.brms <- cs_sampling(svydes = svy14, mod_stan = mod_brms, data_stan = data_brms) #Plot the results plot(mod.brms, varnames = paste("b", 1:2, sep =""))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.