How do we go about modifying/refactoring the code to allow it to more flexibly/generally handle different models of time-varying parameters?
run_sim_range
for every day?enum
-based switches if necessary)...
or do.call()
(or stored info in objects)?run_sim_break
by allowing other functions to specify the time variation of parameters (see OTHER_TIMEFUN
below)run_sim
/run_sim_range
/do_step
? Calling run_sim_range
for every step, as we would have to do with more continuously time-varying parameters, seems awkward. On the other hand, run_sim_range
was created in the first place so that we didn't have to deal with date-processing at the lowest level. Maybe run_sim
could process dates into a zero-based numeric format/find numeric offsets for parameters, so that run_sim_range
was continuous but low-level (i.e. suitable for coding in TMB/Stan/etc.)?forecast_sim
/run_sim_break
? This made more sense before I started thinking about adding tracks parallel to run_sim_break
...library(mvbutils) library(igraph) library(McMasterPandemic) pos <- which(search()=="package:McMasterPandemic") ## FIXME: include predict methods? funs <- grep("mle_fun|forecast|calibrate$|run_|do_step",ls(pos=pos),value=TRUE) funs <- c(funs,"predict.fit_pansim") ff <- foodweb(where=pos, funs=funs, ## rprune="forecast|calibrate|run_", plotting=FALSE, ancestors=FALSE,descendents=FALSE) ## HACK: foodweb doesn't recognize do.call() or ... ? M <- ff$funmat M["predict.fit_pansim",c("forecast_sim","forecast_ensemble")] <- 1 ## HACK: set up another function parallel to run_sim_break ## M <- rbind(M,0) ## M <- cbind(M,0) ## rownames(M)[nrow(M)] <- "OTHER_TIMEFUN" ## colnames(M)[ncol(M)] <- "OTHER_TIMEFUN" ## M[,"OTHER_TIMEFUN"] <- M[,"run_sim_break"] ## M["OTHER_TIMEFUN",] <- M["run_sim_break",] ## HACK: calibrate effectively calls the other run_sim run_sim_funs <- setdiff(grep("run_sim_",rownames(M),value=TRUE),"run_sim_range") M["forecast_sim",run_sim_funs] <- 1 M[run_sim_funs,"run_sim"] <- 1 g <- igraph::graph_from_adjacency_matrix(M) plot(g,layout=layout.graphopt) ## fruchterman.reingold
How do we deal efficiently and transparently with parameters that need to get passed/used in different places? At the moment the structure (and starting values) is stored in opt_pars
, but different parts are used by different components. We want to save as much as necessary and pass the right pieces ...
params
gets used only (?) within run_sim()
and run_sim_range()
(and in various summary methods, Jacobian calcs, etc.)(log_)nb_disp
gets only used in mle_fun
; it NULLs any elements containing nb_disp
before passingtime_args
includes componentscalibrate
show_args <- function(f) { p <- paste(names(formals(f)), collapse=", ") cat("**parameters:** ",strwrap(p),sep="\n") }
takes data and a set of starting parameters/structure/etc. and uses DEoptim
and mle2
to estimate parameters by trajectory matching
show_args(calibrate)
sim_args
is passed down, eventually to run_sim
. Can we also use it for arguments to sim_fun
(the one with time-varying stuff) and strip those arguments before passing it down - or use ...
to catch junk in run_sim
?mle_fun
calibrate
so that we can do other things with the log-likelihood (e.g. calculate importance weights for ensembles)show_args(mle_fun)
forecast_sim
inverse-link transforms parameters, re-lists them into the format of opt_pars
, then calls run_sim_break
; then condenses/aggregates/pivots results to match data format
show_args(forecast_sim)
p
(numeric, named parameter vector); opt_pars
(list: starting values and structure for "re-listing")
forecast_ensemble
Calls forecast_sim
repeatedly to generate an ensemble of trajectories, then computes (possibly weighted) quantiles (alternately may return an array of (replicate $\times$ time $\times$ variable). May also (indirectly) call mle_fun
if importance weights are requested.
show_args(forecast_ensemble)
run_sim_break
Thin wrapper for run_sim
: converts break_dates
plus relative beta information (rel_beta0
) into a data frame of points at which parameters change, then passes that info plus parameters to run_sim
. OTHER_TIMEFUN
would set up different time-dependence in parameters based on dates, parameters (and other covariates such as mobility?)
show_args(run_sim_break)
run_sim
Constructs rate matrix; processes information about when to turn on stochasticity. Loops over change points, calling run_sim_range
repeatedly. Currently assumes that only foi changes
show_args(run_sim)
run_sim_range
Run multiple simulation steps. Assumes constant per capita rates (except for foi, which is updated at each step)
show_args(run_sim_range)
do_step
Run a single simulation step. Updates foi (redundantly?); generates total flows/derivatives from rate matrix and applies them
show_args(do_step)
cumRep: what we want to do is have obs error on individual reporting, then cumsum() the "report" variable what we need to do is cumulate after applying obs error etc.
Refactor in Template Model Builder [@kristensen_tmb_2016]
Speed. It currently takes at least 1 hour to run a calibration of the model, sometimes longer. While this is feasible within a daily workflow, it can be problematic at crunch times (e.g., we have a PHAC request for a small change in the model to be recalibrated and returned on a short time scale); it also limits. We are in the process of adding structure (vaccination, age structure) to the model that greatly increases the size of the state space and will exacerbate our computational limitations.
If the TMB component of the model can be expanded to include the full likelihood model, we will be able to take advantage of TMB's native automatic differentiation capability (TMB's main design goal) to greatly improve the speed and stability of calibration.
In addition to raw speed, refactoring in TMB will allow the use of latent variables, i.e. vectors of parameters that are estimated with a penalization/integration step that allows them to be treated as random variables, i.e. estimated as being drawn from a distribution. This is a standard tool in statistical modeling of dynamical systems, but is only practical in Bayesian models and certain frameworks that allow estimation involving integration via the Laplace approximation and related approaches. One of TMB
's features is a built-in Laplace approximation engine, which allows any specified parameter vector to be modeled as a Gaussian latent variable. In particular it will allow us to model the transmission rate as evolving according to an autocorrelated process, which is a key element of realism. (Similar approaches are used, for example, in @asher_forecasting_2018 [which was the most accurate model in the 2018 RAPIDD Ebola forecasting challenge] and in @davies_estimated_2021 [estimating relative spread rates of COVID variants of concern].)
(Fixme: how close could we get to what we want with penalization, without implementing Laplace?)
Finally, using TMB will in principle allow an extension to estimating the model using Hamiltonian Monte Carlo with the tmbstan
R package [@monnahan_no-u-turn_2018] as an interface to the Stan language [@li_fitting_2018; @chatzilena_contemporary_2019].
The Rcpp
interface is more widely used than TMB
, more mature, and allows use of a broader set of C++ standard library tools. However, it does not provide easy access to the Laplace approximation or Stan HMC engine.
(Include flow diagram/call graph here.)
Goal is to expand TMB component as far up the chain as possible, but leaving complex matching steps (rate matrix construction; construction of indices for condensing multiple state variables into single observed values; construction of data vectors for comparison with simulated dynamics; construction of date-breakpoint models or model matrices for general time variation (see below) in R, to be passed as data elements to TMB.
Basic design of MacPan works by constructing a per capita flow rate matrix: at each step,
do_step
/run_sim_range
in TMB: simplest (pass fixed rate matrix) will speed up all operations, but not allow leveraging latent variables or automatic differentation toolsrun_sim
: will require construction of an indexed sparse matrix (i.e. a way to map changes in rate variables to changes in specific elements of the rate matrix). Again, the goal is to keep as much of the complexity in model parameterization and setup outside of the TMB component.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.