In the final section of this presentation we will walk through a proposal that
pretty much all trial data can and should be turned into portable pseudo data
that is shared freely without restrictions. We has originally planned to show
two versions of this: one using {synthpop}
and the other using {mvProbit}
.
After reviewing the synthpop documentation more closely however it looks like
you can actually just save the model objects which means you can interact with
them using local data. This is much faster/more general than the multivariate
probit approach so we won't move that work forward any further. Note that this
repository also includes the code necessary to do a bayesian version of the
multivariate probit model which is primarily of interest if you'd like to do
something like specify treatment effects across binomial outcomes as
shared/exchangeable.
We will use vanilla default settings for synthpop since the data here is simulated
but in real applications there is a lot more fine tuning available. We can get
a nice simple default output compare synthetic to original data using the compare
function.
data("ex_dat") dat <- as.data.frame(ex_dat) %>% dplyr::select(trt, mort, sev_ivh, sepsis, cld, nec) set.seed(124) synth_dat <- xfun::cache_rds({ syn(dat, minnumlevels = 5, m = 10, models = FALSE, method = "parametric") }, dir = cache, file = "synth_dat.rds") compare(synth_dat, data = dat)
We can also take a quick look at correlations for a pooled version of the synthetic datasets vs the observed data.
synth_dat$syn %>% do.call(rbind, .) %>% mutate_all(as.numeric) %>% select(-trt) %>% cor()
dat[,-1] %>% cor()
For the rest of the presentation we will just use on example dataset. For the comparison at the end we will create a composite with sev_ivh + cld
pres_dat_syn <- synth_dat$syn[[10]] %>% mutate_all(~ as.numeric(as.character(.))) %>% mutate(cld_ivh = ifelse(sev_ivh == 1 & cld == 1, 1, 0), cld_no_ivh = ifelse(cld == 1 & sev_ivh == 0, 1, 0), ivh_no_cld = ifelse(cld == 0 & sev_ivh == 1, 1, 0)) pres_dat <- dat %>% mutate(cld_ivh = ifelse(sev_ivh == 1 & cld == 1, 1, 0), cld_no_ivh = ifelse(cld == 1 & sev_ivh == 0, 1, 0), ivh_no_cld = ifelse(cld == 0 & sev_ivh == 1, 1, 0)) colMeans(pres_dat) colMeans(pres_dat_syn)
Overall just taking even a single dataset we've done pretty good at replicating the original data.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.