knitr::opts_chunk$set(eval=TRUE,collapse=TRUE,warning=FALSE)
This vignette exhaustively steps through the code, model checks and fitting procedure to use virosolver
to estimate incidence curves from simulated data. This code can be adapted for your own datasets by replacing the Ct data frame, here called example_ct_data
. Note that for your own datasets, substantial calibration and checks are advised.
library(virosolver) library(tidyverse) library(patchwork) library(lazymcmc) library(foreach) library(doParallel) cl <- parallel::makeCluster(4, setup_strategy = "sequential") registerDoParallel(cl)
virosolver
takes an input data frame of cycle threshold (Ct) values with associated sample collection dates from quantitative reverse transcription PCR (RT-qPCR) testing, and reconstructs the incidence curve that gave rise to those measurements. The logic is as follows:
By capturing this logic in a mathematical model, we can obtain a probabilistic estimate of the underlying incidence curve having observed a set of Ct values at some point in time.
To summarize, the key idea is that you can estimate incidence based on cross-sections of observed Ct values. This case study explores the application of virosolver
to SARS-CoV-2 surveillance. A full explanation can be found in the accompanying paper.
virosolver
expects a data frame as input data in long format, where each row corresponds to one tested sample. There should be one column labeled t
, giving the time in days a sample was taken, and one column labeled ct
, giving the Ct value of that tested sample. Note that Ct values are semi-quantitative and their scale depends on the platform/instrument used. It is assumed that all Ct values within the data frame are on the same scale and therefore internally consistent across time points. For this simulation, we assume that 1000 random samples are obtained every 2 weeks, starting from day 55.
data(example_ct_data) print(head(example_ct_data %>% filter(ct < 40)))
## Plot only detectable Ct values p_ct_data <- ggplot(example_ct_data %>% filter(ct < 40)) + geom_violin(aes(x=t,group=t,y=ct),scale="width",fill="grey70",draw_quantiles=c(0.025,0.5,0.975)) + geom_jitter(aes(x=t,y=ct),size=0.1,width=2,height=0) + scale_y_continuous(trans="reverse") + theme_bw() + ylab("Ct value") + xlab("Observation time") + ggtitle("Observed Ct values < 40 (the limit of detection) over time") p_ct_data
p_detectable_data <- example_ct_data %>% mutate(detect=ct < 40) %>% group_by(t) %>% summarize(prev=sum(detect)/n()) %>% ggplot() + geom_point(aes(x=t,y=prev)) + theme_bw() + scale_y_continuous(limits=c(0,0.6)) + ylab("Proportion detectable") + ggtitle("Proportion of samples with Ct values < 40") + xlab("Observation time") p_detectable_data
A script to generate these data can be found in the extdata
folder, found here.
A key part of the model is the assumed viral kinetics curve. This describes the mode and variation of Ct values on each day post infection. This is the population-level distribution -- it does not track individual-level viral kinetics curve, so the variation about the mode captures all variation arising from sampling variation, individual-level heterogeneity etc. It is CRUCIAL to understand the parameters underpinning this model when applying virosolver
to a new dataset, as this calibration will be entirely dependent on the population being tested and the PCR instrument used. Consider questions like "what are the mean and range of Ct values I expect to see if I test someone on my platform x days post infection?", "what proportion of positive samples taken x days post infection do I expect to test positive?".
We assume the following Ct model for this simulation:
data(example_gp_partab) pars <- example_gp_partab$values names(pars) <- example_gp_partab$names ## Solve the Ct model over a range of times since infection (referred to as "ages") test_ages <- seq(1,50,by=1) ## This gives the modal Ct value cts <- viral_load_func(pars, test_ages) p_ct_model <- ggplot(data.frame(ct=c(40,cts),t=c(0,test_ages))) + geom_line(aes(x=t,y=ct)) + scale_y_continuous(trans="reverse", limits=c(40,10)) + theme_bw() + ylab("Modal Ct value") + xlab("Days since infection") ## Note that this model does not solve for t=0, ## as it is always assumed that no one is detectable 0 days post infection prop_detect <- prop_detectable(test_ages,pars, cts) p_ct_model_detectable <- ggplot(data.frame(p=c(0,prop_detect),t=c(0,test_ages))) + geom_line(aes(x=t,y=p)) + theme_bw() + ylab("Proportion of infections\n still detectable") + xlab("Days since infection") p_ct_model/p_ct_model_detectable
An intuitive way to look at this curve is to simulate observations from it and plot the simulated Ct values ordered by days since infection. From this, we can see the substantial amount of variation in measurements on each day post infection, but also that there is still some information in the Ct values. Note also that under this model, individuals become truly undetectable (ie. fully recovered) following some daily Bernoulli process. Those who remain detectable for a long time demonstrate a "narrowing" of observed Ct values, where the variation about the mode decreases. This model captures the observation that few individuals remain detectable for a long time, but for those who do, they do not necessarily have Ct values that tend towards the limit of detection.
sim_cts <- simulate_viral_loads_example(test_ages, pars,N=200) print(head(sim_cts)) p_sim_cts_age <- ggplot(sim_cts %>% filter(ct < 40)) + geom_point(aes(x=age,y=ct),alpha=0.25) + scale_y_continuous(trans="reverse",limits=c(40,10)) + theme_bw() + ylab("Ct value") + xlab("Days since infection") + ggtitle("Simulated detectable Ct values on each day post infection") p_sim_cts_age
Now that we have our Ct data and understand the assumed viral kinetics underpinning the model, we can get into the inference framework. We use a Markov chain Monte Carlo framework to estimate the posterior distributions of the free model parameters, conditional on the observed Ct data (the likelihood) and any priors we wish to place on the Ct model and incidence curve parameters. We'll step through the general principles of using the MCMC package, lazymcmc
, define priors for key model parameters, and then demonstrate how the model works using either a single cross section of data or multiple cross sections.
lazymcmc
uses a data frame (usually called parTab
or par_tab
) to track model parameters. The table allows users to fix/estimate certain parameters and also to specify upper and lower bounds. See ?example_gp_partab
for more information, and refer to the the lazymcmc
vignettes for more detail.
data(example_gp_partab) head(example_gp_partab) ## Illustration -- set the `viral_peak` parameter to be estimated during the procedure, and the `intercept` parameter to be fixed example_gp_partab <- example_gp_partab %>% filter(names == "viral_peak") %>% mutate(fixed=0) example_gp_partab <- example_gp_partab %>% filter(names == "intercept") %>% mutate(fixed=1)
We need to specify priors on all estimated model parameters. We use informative priors for the Ct model, as we need to constrain its shape based on our knowledge of viral kinetics to ensure identifiability of the incidence curve. We use less informative priors for the epidemic model, as that's what we're interested in estimating. Here, we use the example parameter table to find the prior means for each model parameter, and then we set the prior standard deviations. We then define a function to be used later, which takes a vector of parameters (with names corresponding to entries in the parameter table), which returns a single log prior probability.
## Read in the SEIR model parameter control table data(example_seir_partab) ## Pull out the current values for each parameter, and set these as the prior means means <- example_seir_partab$values names(means) <- example_seir_partab$names ## Set standard deviations of prior distribution sds_seir <- c("obs_sd"=0.5,"viral_peak"=2, "wane_rate2"=1,"t_switch"=3,"level_switch"=1, "prob_detect"=0.03, "incubation"=0.25, "infectious"=0.5) ## Define a function that returns the log prior probability for a given vector of parameter ## values in `pars`, given the prior means and standard deviations set above. prior_func_seir <- function(pars,...){ ## Ct model priors obs_sd_prior <- dnorm(pars["obs_sd"], means[which(names(means) == "obs_sd")], sds_seir["obs_sd"],log=TRUE) viral_peak_prior <- dnorm(pars["viral_peak"], means[which(names(means) == "viral_peak")], sds_seir["viral_peak"],log=TRUE) wane_2_prior <- dnorm(pars["wane_rate2"],means[which(names(means) == "wane_rate2")],sds_seir["wane_rate2"],log=TRUE) tswitch_prior <- dnorm(pars["t_switch"],means[which(names(means) == "t_switch")],sds_seir["t_switch"],log=TRUE) level_prior <- dnorm(pars["level_switch"],means[which(names(means) == "level_switch")],sds_seir["level_switch"],log=TRUE) ## Beta prior on the prob_detect parameter to ensure between 0 and 1 beta1_mean <- means[which(names(means) == "prob_detect")] beta1_sd <- sds_seir["prob_detect"] beta_alpha <- ((1-beta1_mean)/beta1_sd^2 - 1/beta1_mean)*beta1_mean^2 beta_beta <- beta_alpha*(1/beta1_mean - 1) beta_prior <- dbeta(pars["prob_detect"],beta_alpha,beta_beta,log=TRUE) ## SEIR model priors incu_prior <- dlnorm(pars["incubation"],log(means[which(names(means) == "incubation")]), sds_seir["incubation"], TRUE) infectious_prior <- dlnorm(pars["infectious"],log(means[which(names(means) == "infectious")]),sds_seir["infectious"],TRUE) ## Sum up obs_sd_prior + viral_peak_prior + wane_2_prior + tswitch_prior + level_prior + beta_prior + incu_prior + infectious_prior }
Next, we need a function to give the likelihood of observing a particular set of Ct values conditional on the underlying model parameters. Part of this is to determine the function for the probability of infection over time (the incidence curve). virosolver
has some prebuilt functions for this, and here we'll be using the SEIR model to describe incidence. We can then use create_posterior_func
to create a new function which calculates the posterior probability of a set of parameter values conditional on the Ct data. This function expects a vector of model parameters with names corresponding to the parameter control table and returns a single log posterior probability. Note that the parameter vector needs entries for each parameter needed to solve both the Ct model and the incidence model.
Note that the use_pos
argument is important. If you want to fit the model using ALL PCR results (ie. including samples testing negative), you should set this to FALSE
. If your data only includes positive Ct values (ie. only Ct values < the LOD), then you should set this to TRUE
.
## Point to a function that expects a vector of named parameters and returns a vector of daily infection probabilities/incidence incidence_function <- solveSEIRModel_lsoda_wrapper ## Use the example parameter table data(example_seir_partab) ## Create the posterior function used in the MCMC framework posterior_func <- create_posterior_func(parTab=example_seir_partab, data=example_ct_data, PRIOR_FUNC=prior_func_seir, INCIDENCE_FUNC=incidence_function, use_pos=FALSE) ## Important argument, see text ## Test with default parameters to find the log likelihood posterior_func(example_seir_partab$values)
We'll now demonstrate how to use virosolver
to estimate an SEIR-generated incidence curve from a single-cross section of data. We'll use Ct values from the fifth sampling time, and then run the MCMC framework. This timepoint was chosen as the posterior distribution is unimodal -- if you use timepoints with multi-modal posteriors, you'll need to use the parallel tempering branch of lazymcmc
. The majority of Ct values are undetectable, so we'll just look at the distribution of Ct values below the limit of detection.
## Filter to Ct values on day 111 (fifth timepoint) ct_data_use <- example_ct_data %>% filter(t == 111) p_ct_use <- ggplot(ct_data_use %>% filter(ct < 40)) + geom_histogram(aes(x=ct)) + theme_bw() p_ct_use
First, we need to set random starting values for the MCMC procedure.
## Function from the virosolver package to generate random starting parameter values that return a finite likelihood start_tab <- generate_viable_start_pars(example_seir_partab,ct_data_use, create_posterior_func, incidence_function, prior_func_seir)
Depending on the MCMC settings we wish to use (see the lazymcmc
documentation), we need to do some additional set up. covMat
,mvrPars
and mcmc_pars
specify needed objects for the Metropolis-Hastings algorithm, as well as the length of the MCMC chain.
covMat <- diag(nrow(start_tab)) mvrPars <- list(covMat,2.38/sqrt(nrow(start_tab[start_tab$fixed==0,])),w=0.8) mcmc_pars <- c("iterations"=200000,"popt"=0.234,"opt_freq"=2000, "thin"=1000,"adaptive_period"=100000,"save_block"=100)
This is where the magic happens: run_MCMC
takes in all of the objects we've set up and runs the full MCMC procedure.
dir.create("mcmc_chains/readme_single_cross_section",recursive=TRUE) ################################## ## RUN THE MCMC FRAMEWORK ## Run 3 MCMC chains. Note that it is possible to parallelize this loop with foreach and doPar ## Note the `use_pos` argument needs to be set here too nchains <- 3 res <- foreach(chain_no=1:nchains,.packages = c("virosolver","lazymcmc","extraDistr","tidyverse","patchwork")) %dopar% { outputs <- run_MCMC(parTab=start_tab, data=ct_data_use, INCIDENCE_FUNC=incidence_function, PRIOR_FUNC=prior_func_seir, mcmcPars=mcmc_pars, filename=paste0("mcmc_chains/readme_single_cross_section/readme_seir_",chain_no), CREATE_POSTERIOR_FUNC=create_posterior_func, mvrPars=mvrPars, use_pos=FALSE) ## Important argument }
After the MCMC sampling is finished, we use a function from lazymcmc
to load in all of the MCMC chains, and we then use ggplot2
to investigate the trace plots to check for convergence. You should check other convergence diagnostics like effective sample size and Rhat, but that is beyond the scope of this vignette.
## Read in the MCMC chains chains <- load_mcmc_chains(location="mcmc_chains/readme_single_cross_section", parTab=start_tab, burnin=mcmc_pars["adaptive_period"], chainNo=TRUE, unfixed=TRUE, multi=TRUE) ## Reshape for plotting chains_melted <- chains$chain %>% as_tibble %>% group_by(chain) %>% mutate(sampno=1:n()) %>% pivot_longer(-c(sampno,chain)) ## Look at trace plots p_trace <- ggplot(chains_melted) + geom_line(aes(x=sampno,y=value,col=as.factor(chain))) + facet_wrap(~name,scales="free_y") + scale_color_viridis_d(name="Chain") + theme_bw() + xlab("Iteration") + ylab("Value") p_trace
We also use the MCMC chains (loaded with unfixed=FALSE
here to include all fixed parameters) to solve the model using a number of posterior draws. This allows us to construct credible intervals on the model predictions and to visualize the inferred incidence curve. We compare this to the true (blue) incidence curve to see how accurate the estimates are. We can see that the posterior draws nicely capture the blue line, demonstrating that the recovered incidence curve is close to the true value.
## Load in MCMC chains again, but this time read in the fixed parameters too ## to ensure that the posterior draws are compatible with the model functions chains <- load_mcmc_chains(location="mcmc_chains/readme_single_cross_section", parTab=start_tab, burnin=mcmc_pars["adaptive_period"], chainNo=FALSE, unfixed=FALSE, multi=TRUE) ## Do some reshaping to allow correct subsampling (we need each sampno to correspond to one unique posterior draw) chain_comb <- chains$chain %>% as_tibble() %>% mutate(sampno=1:n()) %>% as.data.frame() ## Load in true incidence curve to compare to our prediction data(example_seir_incidence) predictions <- plot_prob_infection(chain_comb, nsamps=100, INCIDENCE_FUNC=incidence_function, solve_times=0:max(ct_data_use$t), obs_dat=ct_data_use, true_prob_infection=example_seir_incidence) p_incidence_prediction <- predictions$plot + scale_x_continuous(limits=c(0,200)) p_incidence_prediction
It's also a good idea to check things like the estimated daily growth rate and time-varying reproductive number. A bit more computation is involved here, but the key is to check our estimated predictions against the true values.
## Get smoothed growth rates samps <- sample(unique(chain_comb$sampno),100) dat_inc <- NULL solve_times <- 0:250 for(ii in seq_along(samps)){ pars <- get_index_pars(chain_comb, samps[ii]) inc_est <- incidence_function(pars,solve_times)[1:length(solve_times)] inc_est <- inc_est/sum(inc_est) tmp_inc <- tibble(t=solve_times,inc=inc_est) x <- tmp_inc$inc ## Get Rt seir_pars <- c("beta"=pars["R0"]*(1/pars["infectious"]),1/pars["incubation"],1/pars["infectious"]) names(seir_pars) <- c("beta","sigma","gamma") init <- c(1-pars["I0"],0,pars["I0"],0) sol <- solveSEIRModel_lsoda(solve_times, init, seir_pars) S_compartment <- c(rep(1, pars["t0"]),sol[,2])[1:length(solve_times)] tmp_inc$Rt <- S_compartment*unname(pars["R0"]) tmp_inc$gr <- c(NaN,log(x[2:length(x)]/x[1:(length(x)-1)])) dat_inc[[ii]] <- tmp_inc dat_inc[[ii]]$samp <- ii } trajs <- do.call("bind_rows",dat_inc) trajs1_quants <- trajs %>% group_by(t) %>% summarize(lower_95=quantile(gr, 0.025,na.rm=TRUE), lower_50=quantile(gr,0.25,na.rm=TRUE), median=median(gr,na.rm=TRUE), upper_50=quantile(gr,0.75,na.rm=TRUE), upper_95=quantile(gr, 0.975,na.rm=TRUE)) Rt_quants <- trajs %>% group_by(t) %>% summarize(lower_95=quantile(Rt, 0.025,na.rm=TRUE), lower_50=quantile(Rt,0.25,na.rm=TRUE), median=median(Rt,na.rm=TRUE), upper_50=quantile(Rt,0.75,na.rm=TRUE), upper_95=quantile(Rt, 0.975,na.rm=TRUE)) ## Growth rate plot ## Calculate true growth rate from simulated data true_incidence <- example_seir_incidence$prob_infection true_growth_rate <- tibble(gr=log(true_incidence[2:length(true_incidence)]/true_incidence[1:(length(true_incidence)-1)]),t=1:(length(true_incidence)-1)) p_gr <- ggplot(trajs1_quants) + geom_hline(yintercept=0,col="grey70") + geom_ribbon(aes(x=t,ymin=lower_95,ymax=upper_95,fill="95% CrI"),alpha=0.25) + geom_line(aes(x=t,y=median,col="Posterior median"),col="black") + geom_line(data=true_growth_rate,aes(x=t,y=gr,col="True value"),linetype="dashed") + geom_vline(data=tibble(x=unique(ct_data_use$t)),aes(col="Sample time",xintercept=x),linetype="dashed") + coord_cartesian(ylim=c(-0.5,0.5)) + scale_color_manual(name="",values=c("Posterior median"="black","True value"="blue","Sample time"="red")) + scale_fill_manual(name="",values=c("95% CrI"="gray40")) + scale_x_continuous(limits=c(0,200)) + ylab("Estimated growth rate") + xlab("Time") + theme_bw() ## Rt plot ## Calculate true Rt from true model parameters epidemic_process <- simulate_seir_process(pars,solve_times,N=1) ## Widen solution and extract key variables res <- epidemic_process$seir_outputs %>% pivot_wider(names_from="variable",values_from="value") p_rt <- ggplot(Rt_quants) + geom_hline(yintercept=1,linetype="dashed",col="grey70") + geom_ribbon(aes(x=t,ymin=lower_95,ymax=upper_95,fill="95% CrI"),alpha=0.25) + geom_line(aes(x=t,y=median,col="Posterior median")) + geom_line(data=res %>% filter(time <= max(trajs1_quants$t)), aes(x=time,y=Rt,col="True value"),linetype="dashed") + geom_vline(data=tibble(x=unique(ct_data_use$t)),aes(col="Sample time",xintercept=x),linetype="dashed") + scale_color_manual(name="",values=c("Posterior median"="black","True value"="blue","Sample time"="red")) + scale_fill_manual(name="",values=c("95% CrI"="gray40")) + scale_x_continuous(limits=c(0,200)) + ylab("Rt") + xlab("Time") + theme_bw() p_gr/p_rt
We also check the model-predicted Ct distribution against the used data -- this is a visual assessment of model fit.
## Use create_posterior_func to return the predicted Ct distribution rather than the posterior probability model_func <- create_posterior_func(example_seir_partab,ct_data_use,NULL,incidence_function,"model") ## Pass model_func to a plotting function to observe predicted Ct distribution against data p_distribution_fit <- plot_distribution_fits(chain_comb, ct_data_use, model_func,100,pos_only=FALSE) p_distribution_fit
The final part of this case study is to use a Gaussian Process prior to estimate the full incidence curve using all cross-sections of Ct values. The overall set up is identical, but we need to use a new incidence function, new prior function and new parameter table corresponding to the new incidence model.
Note that we set the incidence_function
pointer to gaussian_process_model
here, and load in the GP parameter table. We also need to reduce the size of par_tab
so that we only solve the model in the range of times given by the data. That is, we only want to find incidence for each day in the vector between t=0 and the final sampling time -- we don't do any forecasting here. We need one entry of prob
for each time we're solving the model for.
## MCMC chain options mcmc_pars <- c("iterations"=500000,"popt"=0.44,"opt_freq"=2000, "thin"=2500,"adaptive_period"=200000,"save_block"=100) ## Set pointer to the Gaussian Process model as the incidence function incidence_function <- gaussian_process_model ## Read in the GP model parameter control table data(example_gp_partab) ## This is for the GP version times <- 0:max(example_ct_data$t) mat <- matrix(rep(times, each=length(times)),ncol=length(times)) t_dist <- abs(apply(mat, 2, function(x) x-times)) par_tab <- example_gp_partab par_tab <- bind_rows(par_tab[par_tab$names != "prob",], par_tab[par_tab$names == "prob",][1:length(times),]) pars <- par_tab$values names(pars) <- par_tab$names ## Pull out the current values for each parameter, and set these as the prior means means <- par_tab$values names(means) <- par_tab$names ## Set standard deviations of prior distribution sds_gp <- c("obs_sd"=0.5,"viral_peak"=2, "wane_rate2"=1,"t_switch"=3,"level_switch"=1, "prob_detect"=0.03, "incubation"=0.25, "infectious"=0.5) ## Define a function that returns the log prior probability for a given vector of parameter ## values in `pars`, given the prior means and standard deviations set above. ## Prior for GP version prior_func_gp <- function(pars, ...){ par_names <- names(pars) ## Viral kinetics parameters obs_sd_prior <- dnorm(pars["obs_sd"], means[which(names(means) == "obs_sd")], sds_gp["obs_sd"],log=TRUE) viral_peak_prior <- dnorm(pars["viral_peak"], means[which(names(means) == "viral_peak")], sds_gp["viral_peak"],log=TRUE) wane_2_prior <- dnorm(pars["wane_rate2"],means[which(names(means) == "wane_rate2")],sds_gp["wane_rate2"],log=TRUE) tswitch_prior <- dnorm(pars["t_switch"],means[which(names(means) == "t_switch")],sds_gp["t_switch"],log=TRUE) level_prior <- dnorm(pars["level_switch"],means[which(names(means) == "level_switch")],sds_gp["level_switch"],log=TRUE) beta1_mean <- means[which(names(means) == "prob_detect")] beta1_sd <- sds_gp["prob_detect"] beta_alpha <- ((1-beta1_mean)/beta1_sd^2 - 1/beta1_mean)*beta1_mean^2 beta_beta <- beta_alpha*(1/beta1_mean - 1) beta_prior <- dbeta(pars["prob_detect"],beta_alpha,beta_beta,log=TRUE) ### VERY IMPORTANT ## Gaussian process prior, un-centered version k <- pars[which(par_names=="prob")] ## Leave this - correct for uncentered version as per Chapter 14 Statistical Rethinking prob_priors <- sum(dnorm(k, 0, 1, log=TRUE)) ######### nu_prior <- dexp(pars["nu"], 1/means[which(names(means) == "nu")],log=TRUE) rho_prior <- dexp(pars["rho"], 1/means[which(names(means) == "rho")],log=TRUE) obs_sd_prior + viral_peak_prior + wane_2_prior + tswitch_prior + level_prior + beta_prior + prob_priors + nu_prior + rho_prior }
## Check that posterior function solves and returns a finite likelihood posterior_function <- create_posterior_func(parTab=par_tab, data=example_ct_data, PRIOR_FUNC=prior_func_gp, INCIDENCE_FUNC=incidence_function, t_dist=t_dist) posterior_function(par_tab$values)
We're now ready to solve the full Gaussian Process prior model. This is going to take quite a lot longer than the single cross section SEIR model, as there is far more data and the model solve time is longer. This may take a couple hours to run sufficiently long chains.
dir.create("mcmc_chains/readme_multiple_cross_section",recursive=TRUE) ################################## ## RUN THE MCMC FRAMEWORK ## Run 3 MCMC chains. Note that it is possible to parallelize this loop with foreach and doPar ## Note the `use_pos` argument needs to be set here too nchains <- 3 res <- foreach(chain_no=1:nchains,.packages = c("virosolver","lazymcmc","extraDistr","tidyverse","patchwork")) %dopar% { ## Get random starting values start_tab <- generate_viable_start_pars(par_tab,example_ct_data, create_posterior_func, incidence_function, prior_func_gp) output <- run_MCMC(parTab=start_tab, data=example_ct_data, INCIDENCE_FUNC=incidence_function, PRIOR_FUNC=prior_func_gp, mcmcPars=mcmc_pars, filename=paste0("mcmc_chains/readme_multiple_cross_section/readme_gp_",chain_no), CREATE_POSTERIOR_FUNC=create_posterior_func, mvrPars=NULL, OPT_TUNING=0.2, use_pos=FALSE, t_dist=t_dist) }
The remaining steps are the same as the single cross section example. We read in the MCMC chains, check for convergence, and then use the posterior draws to check our estimates.
## Read in the MCMC chains chains <- load_mcmc_chains(location="mcmc_chains/readme_multiple_cross_section", parTab=par_tab, burnin=mcmc_pars["adaptive_period"], chainNo=TRUE, unfixed=TRUE, multi=FALSE) ## Reshape for plotting chains_melted <- chains$chain %>% as_tibble %>% group_by(chain) %>% mutate(sampno=1:n()) %>% pivot_longer(-c(sampno,chain)) ## Look at trace plots p_trace_gp <- chains_melted %>% filter(!(name %in% paste0("prob.",1:max(times)))) %>% ggplot() + geom_line(aes(x=sampno,y=value,col=as.factor(chain))) + facet_wrap(~name,scales="free_y") + scale_color_viridis_d(name="Chain") + theme_bw() + xlab("Iteration") + ylab("Value") p_trace_gp
Plot predicted incidence curves using posterior draws and compare to the true incidence curve (blue).
## Load in MCMC chains again, but this time read in the fixed parameters too ## to ensure that the posterior draws are compatible with the model functions chains <- load_mcmc_chains(location="mcmc_chains/readme_multiple_cross_section/", parTab=par_tab, burnin=mcmc_pars["adaptive_period"], chainNo=FALSE, unfixed=FALSE, multi=FALSE) ## Do some reshaping to allow correct subsampling (we need each sampno to correspond to one unique posterior draw) chain_comb <- as.data.frame(chains$chain) chain_comb$sampno <- 1:nrow(chain_comb) ## Load in true incidence curve to compare to our prediction data(example_seir_incidence) predictions <- plot_prob_infection(chain_comb,nsamps=100, INCIDENCE_FUNC=incidence_function, solve_times=0:max(example_ct_data$t),obs_dat=example_ct_data, true_prob_infection=example_seir_incidence, smooth=TRUE) ## Smooth the trajectories a bit p_incidence_prediction <- predictions$plot + scale_x_continuous(limits=c(0,200)) p_incidence_prediction
Check model-predicted Ct distributions against observed data at each cross section.
## Use create_posterior_func to return the predicted Ct distribution rather than the posterior probability model_func_gp <- create_posterior_func(par_tab,example_ct_data,NULL,incidence_function,"model") ## Pass model_func to a plotting function to observe predicted Ct distribution against data p_distribution_fit_gp <- plot_distribution_fits(chain_comb, example_ct_data, model_func_gp,100,pos_only=FALSE) p_distribution_fit_gp
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.