In Enpeterson/outputsim: Output Simulation Results

knitr::opts_chunk$set(echo = TRUE)

Introduction

We have a Bayesian hierarchical model to estimate the extent of incorrect classification of maternal death (misclassification)

Misclassification is broken down into 2 parameters: 1. Sensitivity: True positive rate 2. Specificity: True negative rate

Our Aim: Compare bias across different data settings using simulation study results and automate our simulation output/comparison using an R package. Data Settings Implemented: 1. Full multinomial data (alldata) 2. True maternal (TP + FN) and Observed maternal (TP + FP) (nobreakdown)

For the purpose of this example analysis, we will not perform simulation exercises, but work with toy MCMC sample datasets to show the automated output functions.

Below we go through the process of producing simulation output between two run settings. First we specify filepaths and read in libraries

toy.dir <- paste(getwd(), "/toy_mcmc/", sep="")
output.toy.dir <- paste(getwd(), "/toy_mcmc/", sep="")
library(runjags)
library(R2jags)
library(coda)
library(truncnorm)
library(tidyverse)
library(xtable)
library(devtools)
library(testthat)
load_all()

Simulation Process (brief summary)

In the simulation process we set global parameters settings, we call the truth.

#Set the true for global values. These are values I am fixing.
global_settings <- c(sensworld = 0.5, #global sens value
                     specworld = 0.9, #global spec value
                     rho.alphabeta = -0.3, #correlation between sens and spec
                     rrho.alphabeta = -0.3, #correlation between sens and spec in reference year
                     rsigma.alpha = 0, #sd for sens in ref year
                     rsigma.beta = 0, #sd for spec in ref year
                     sigma.alpha = 0.2, #sd for sens in RW
                     sigma.beta = 0.15, #sd for spec in RW
                     C = 1, #number of countries to simulate
                     Nyears = 15, #number of years to simulate,
                     refyear = 10, #set reference year
                     vrenv = 5000 #starting vr env, vr.ct is binomial given vrenv to generate year specific vrenv
                     )
globalvars <- c("sensworld", "specworld", "rho.alphabeta", "sigma.alpha", "sigma.beta")
nsim = 2

Create one country's yearly values for Se, Sp based on global fixed values and our full RW model. A time series for $nyears = 15$.
Using fixed country-year values for Se and Sp AND proportion true maternal out of total $prop^{(true)} = 0.01$, we can back calculate to get $TP, TN, FP, FN$.
We simulate $XXX$ datasets of ``observed" multinomial data.
We change what data we have by data setting, ie Setting 1 is complete, Setting 2 is no breakdown of multinomials, but we have sums $mat^{(true)}$ and $mat^{(obs)}$.
We run the model on $XXX$ simulated data sets for 2 settings.
For each run we save MCMC posterior samples of estimates of Se and Sp.

#Run the simulation and get MCMC samples and their posterior estimates. Saved samples to specified output folder
get_simu_output(runname = runname,
                global_settings = global_settings,
                output.dir = output.toy.dir,
                nsim)

\bigskip

Summarization Process

We pull in the MCMC samples and get posterior summaries of median, lower, upper, and calculate bias for each country-year estimate for each setting and simulation number using summarize_simulation.
We combine all the iteration results in summarize_runsetting to produce 1 tibble of summaries.

  settings <- c("setting1", "setting2")
for(sett in settings){
for(i in 1:nsim){

#Summarizes estimates and bias for each simulation
 summarize_simulation(simsetting = sett,
                       runname = sett,
                       simnum = i,
                       globalvars = gobalvars,
                       output.dir = paste(output.toy.dir, sett, "/", sep="")
  )

}
summary_tib <<- summarize_runsetting(output.dir = paste(output.toy.dir, sett, "/", sep=""),
                     simsetting = sett,
                     nsim = nsim)  
}

head(summary_tib[100:110,])

Output

From these summaries, we pull in simulation results from all run settings. We compare results with tables and graphical output.

Model performance is assessed using (1) Bias and (2) Coverage (shown in output table)

#Specify the models you want to compare
runnames = c("setting1", "setting2")

#Select parameters you want to plot
plotpars <- c("sens.ct[1,10]", "spec.ct[1,10]")

#Get xtable with simulation output
output <- sum_across_sim(runsettings = runnames)
head(output[50:60,])

#Get comparison plots saved to figure directory
produce_plots(runnames = runnames,
              plotpars = plotpars,
              fig.dir = paste(output.toy.dir) )

We use the tibble created by \textit{summarize_runsetting.R} to compare the truth with posterior estimates by simulation using \textit{plot_nsim.R} function. This plot compares truth vs estimates across simulations.
We plot the distribution of posterior median estimates against the truth using the \textit{plot_box.R} function.

We show example graphical output from our two toy settings.

  outext  = output.toy.dir
  BuildDF(outext = outext, runnames)
  dftot <- readRDS(paste(outext = outext, "DFtot.RDS", sep=""))
  dfsub = readRDS(paste(outext = outext, "DFsubset.RDS", sep=""))


  print(plot_nsim(df = dfsub))
  print(plot_box(df = dfsub, pars = plotpars))

Testing

Lastly, the \textit{summarize_simulation.R} is tested using the \textit{summarize_simulation_error.R} located in the test folder. Using the testthat package, we write an error which stops the summarize function if different parameters are specified from the ones the script is built to analyze. The expected output from this "error" function is a table with 6 parameters (15 ct estimates for 4 parameters, and 2 estimates for 2 global parameters) and their point estimates and uncertainty intervals if given "workingtib" (drawn from mcmctib_n) and a stop error if given "errortib" (generated with 4 variables). Using the \textit{expect_error} function, we can confirm that an error is returned if parameters of interest are not in the data inputs to the summary function.

errortib <- readRDS(paste(getwd(), '/tests/', 'errortib.RDS', sep=""))



expect_error(summarize_simulation_error(errortib))


test_that("function gives error when the data is missing parameters", {
  expect_error(summarize_simulation_error(errortib))
})