cs_sampling: cs_sampling
In RyanHornby/csSampling: Complex Survey Sampling

View source: R/cs_sampling.R

cs_sampling

R Documentation

cs_sampling

Description

cs_sampling is a wrapper function. It takes in a svydesign object and a stan_model and inputs for sampling. It calls the sampling to generate MCMC draws from the model. The constrained parameters are converted to unconstrained, adjusted, converted back to constrained and then output. The adjustment process estimates a sandwich matrix adjustment to the posterior variance from two information matrices H and J. J is estimated via resampling with withReplicates. For each set of replicate weights, sampling is called with no chains to instantiate a stanfit-class object. The stanfit-class object has an associated method grad_log_prob, which returns the gradient for a given input of unconstrained parameters. The variance of this gradient is taken across the replicates and provides a estimate of the J matrix. 'cs_sampling allows for different options for estimation of the Hessian matrix H. By default H is estimated as the Monte Carlo mean via optimHess using grad_log_prob and each posterior draw as inputs. Using just the posterior mean is faster but less stable (previous default). The asymptotic covariance for the posterior mean is then calculated as Hi V Hi, where Hi is the inverse of H. The asymptotic covariance for the posterior sampling procedure (due to mis-specification) is Hi. By default, cs_sampling takes the "square" root of these matrices via eigenvalue decomposition R1'R1 = Hi V Hi and R2'R2 = Hi. This is more stable but slower than using the Cholesky decomposition (previous default). The final adjustment rescales/rotates the posterior sample by R2iR1 where R2i is the inverse of R2. The final adjust can be interpreted as an asymptotic correction for model mis-specification due to using survey sampling weights as plug in values in the likelihood. This is often know as a "design effect" which is the "ratio" between the variance from simple random sample (Hi) and a complex survey sample (HiVHi).

Usage

cs_sampling(
  svydes,
  mod_stan,
  par_stan = NA,
  data_stan,
  ctrl_stan = list(chains = 1, iter = 2000, warmup = 1000, thin = 1),
  rep_design = FALSE,
  ctrl_rep = list(replicates = 100, type = "mrbbootstrap"),
  H_estimate = "MCMC",
  matrix_sqrt = "eigen",
  sampling_args = list()
)

Arguments

`svydes`	a `svydesign` object or a `svrepdesign` object. This contains cluster ID, strata, and weight information (`svydesign`) or replicate weight information (`svrepdesign`)
`mod_stan`	a compiled stan model to be called by `sampling`
`par_stan`	a list of a subset of parameters to output after adjustment. All parameters are adjusted including the derived parameters, so users may want to only compare subsets. The default, NA, will return all parameters.
`data_stan`	a list of data inputs for `sampling` associated with mod_stan
`ctrl_stan`	a list of control parameters to pass to `sampling`. Currently includes the number of chains, iter, warmup, and thin with defaults
`rep_design`	logical indicating if the svydes object is a `svrepdesign`. If FALSE, the design will be converted to a `svrepdesign` using ctrl_rep settings
`ctrl_rep`	a list of settings when converting svydes from a `svydesign` object to a `svrepdesign` object. replicates - number of replicate weights. type - the type of replicate method to use, the default is mrbbootstrap which sample half of the clusters in each strata to make each replicate (see `as.svrepdesign`).
`H_estimate`	a string indicating the method to use to estimate H. The default "MCMC" is Monte Carlo averaging over posterior draws. Otherwise, a plug-in using the posterior mean.
`matrix_sqrt`	a string indicating the method to use to take the "square root" of the R1 and R2 matrices. The default "eigen" uses the eigenvalue decomposition. Otherwise, the Cholesky decomposition is used.
`sampling_args`	a list of extra arguments that get passed to `sampling`.

Value

A list of the following:

stan_fit - the original stanfit-class object returned by sampling for the weighted model
sampled_parms - the array of parameters extracted from stan_fit corresponding to the parameter block in the stan model (specified by stan_pars)
adjusted_parms - the array of adjusted parameters, corresponding to sampled_parms which have been rescaled and rotated.

Author(s)

Matt Williams.

References

Williams, M. R., and Savitsky, T. D. (2020) Uncertainty Estimation for Pseudo-Bayesian Inference Under Complex Sampling. International Statistical Review, https://doi.org/10.1111/insr.12376.

Examples

# multinomial dependent variable

library(survey)
data(api)
#make sure weights sum to n
apiclus1$newpw <- apiclus1$pw/mean(apiclus1$pw)
dclus1<-svydesign(id=~dnum, weights=~newpw, data=apiclus1, fpc=~fpc)
M1 <- svymean(~stype, dclus1)

#Use default replicate design
rclus1<-as.svrepdesign(dclus1)
M2 <- svymean(~stype, rclus1)

#Construct a Stan model
library(rstan)
model_string <- csSampling::load_wt_multi_model()
mod_dm <- stan_model(model_code = model_string)

#Set the Data for Stan
y <- as.factor(rclus1$variables$stype)
yM <- model.matrix(~y -1)
n <- dim(yM)[1]
K <- dim(yM)[2]
alpha<-rep(1,K)
weights <- rclus1$pweights
data_stan<-list("y"=yM,"alpha"=alpha,"K"=K, "n" = n, "weights" = weights)
ctrl_stan<-list("chains"=1,"iter"=2000,"warmup"=1000,"thin"=1)

#Run the Stan mode
set.seed(12345)
mod1 <- cs_sampling(svydes = rclus1, mod_stan = mod_dm,
data_stan = data_stan, ctrl_stan = ctrl_stan, rep_design = TRUE)

#Plot the results

plot(mod1)
plot(mod1, varnames = paste("theta",1:3, sep = ""))

#Compare Summary Statistics

#hybrid variance adjusted
cbind(colMeans(mod1$adjusted_parms[,4:6]),
sqrt(diag(cov(mod1$adjusted_parms[,4:6]))))
#Taylor Linearization
M1
#Replication
M2

#Bernoulli-distributed dependent variable

dat14 <- csSampling::dat14
#subset to adults
dat14 <- dat14[as.numeric(dat14$CATAG6) > 1,]
#normalize weights to sum to sample size
dat14$WTS <- dat14$ANALWT_C/mean(dat14$ANALWT_C)
svy14 <- svydesign(ids = ~VEREP, strata = ~VESTR,
                   weights = ~WTS, data = dat14, nest = TRUE)

#Construct a Stan model
library(brms)
model_formula <- formula("CIGMON|weights(WTS) ~ AMDEY2_U")
stancode <- make_stancode(brmsformula(model_formula,center = FALSE),
                          data = dat14, family = bernoulli(), save_model = "brms_wt_log.stan")
mod_brms  <- stan_model('brms_wt_log.stan')

#Prepare data for Stan modelling

data_brms <- make_standata(brmsformula(model_formula,center = FALSE),
                          data = dat14, family = bernoulli())

#Run the Stan model

set.seed(12345) #set seed to fix output for comparison
mod.brms <- cs_sampling(svydes = svy14, mod_stan = mod_brms, data_stan = data_brms)


#Plot the results

plot(mod.brms, varnames = paste("b", 1:2, sep =""))

RyanHornby/csSampling documentation built on Jan. 5, 2023, 12:37 p.m.