How do we go about modifying/refactoring the code to allow it to more flexibly/generally handle different models of time-varying parameters?

issues

possibilities

library(mvbutils)
library(igraph)
library(McMasterPandemic)
pos <- which(search()=="package:McMasterPandemic")
## FIXME: include predict methods?
funs <- grep("mle_fun|forecast|calibrate$|run_|do_step",ls(pos=pos),value=TRUE)
funs <- c(funs,"predict.fit_pansim")
ff <- foodweb(where=pos,
              funs=funs,
              ## rprune="forecast|calibrate|run_",
              plotting=FALSE, ancestors=FALSE,descendents=FALSE)
## HACK: foodweb doesn't recognize do.call() or ... ?
M <- ff$funmat
M["predict.fit_pansim",c("forecast_sim","forecast_ensemble")] <- 1
## HACK: set up another function parallel to run_sim_break
## M <- rbind(M,0)
## M <- cbind(M,0)
## rownames(M)[nrow(M)] <- "OTHER_TIMEFUN"
## colnames(M)[ncol(M)] <- "OTHER_TIMEFUN"
## M[,"OTHER_TIMEFUN"] <- M[,"run_sim_break"]
## M["OTHER_TIMEFUN",] <- M["run_sim_break",]
## HACK: calibrate effectively calls the other run_sim
run_sim_funs <- setdiff(grep("run_sim_",rownames(M),value=TRUE),"run_sim_range")
M["forecast_sim",run_sim_funs] <- 1
M[run_sim_funs,"run_sim"] <- 1
g <- igraph::graph_from_adjacency_matrix(M)
plot(g,layout=layout.graphopt) ## fruchterman.reingold

parameter passing

How do we deal efficiently and transparently with parameters that need to get passed/used in different places? At the moment the structure (and starting values) is stored in opt_pars, but different parts are used by different components. We want to save as much as necessary and pass the right pieces ...

Functions

calibrate

show_args <- function(f) {
    p <- paste(names(formals(f)), collapse=", ")
    cat("**parameters:** ",strwrap(p),sep="\n")
}

takes data and a set of starting parameters/structure/etc. and uses DEoptim and mle2 to estimate parameters by trajectory matching

show_args(calibrate)

mle_fun

show_args(mle_fun)

forecast_sim

inverse-link transforms parameters, re-lists them into the format of opt_pars, then calls run_sim_break; then condenses/aggregates/pivots results to match data format

show_args(forecast_sim)

p (numeric, named parameter vector); opt_pars (list: starting values and structure for "re-listing")

forecast_ensemble

Calls forecast_sim repeatedly to generate an ensemble of trajectories, then computes (possibly weighted) quantiles (alternately may return an array of (replicate $\times$ time $\times$ variable). May also (indirectly) call mle_fun if importance weights are requested.

show_args(forecast_ensemble)

run_sim_break

Thin wrapper for run_sim: converts break_dates plus relative beta information (rel_beta0) into a data frame of points at which parameters change, then passes that info plus parameters to run_sim. OTHER_TIMEFUN would set up different time-dependence in parameters based on dates, parameters (and other covariates such as mobility?)

show_args(run_sim_break)

run_sim

Constructs rate matrix; processes information about when to turn on stochasticity. Loops over change points, calling run_sim_range repeatedly. Currently assumes that only foi changes

show_args(run_sim)

run_sim_range

Run multiple simulation steps. Assumes constant per capita rates (except for foi, which is updated at each step)

show_args(run_sim_range)

do_step

Run a single simulation step. Updates foi (redundantly?); generates total flows/derivatives from rate matrix and applies them

show_args(do_step)

report variables/condensation

cumRep: what we want to do is have obs error on individual reporting, then cumsum() the "report" variable what we need to do is cumulate after applying obs error etc.

MacPan advantages/features

MacPan disadvantages

Refactoring goals

Goal

Refactor in Template Model Builder [@kristensen_tmb_2016]

Why refactor in TMB

Speed. It currently takes at least 1 hour to run a calibration of the model, sometimes longer. While this is feasible within a daily workflow, it can be problematic at crunch times (e.g., we have a PHAC request for a small change in the model to be recalibrated and returned on a short time scale); it also limits. We are in the process of adding structure (vaccination, age structure) to the model that greatly increases the size of the state space and will exacerbate our computational limitations.

If the TMB component of the model can be expanded to include the full likelihood model, we will be able to take advantage of TMB's native automatic differentiation capability (TMB's main design goal) to greatly improve the speed and stability of calibration.

In addition to raw speed, refactoring in TMB will allow the use of latent variables, i.e. vectors of parameters that are estimated with a penalization/integration step that allows them to be treated as random variables, i.e. estimated as being drawn from a distribution. This is a standard tool in statistical modeling of dynamical systems, but is only practical in Bayesian models and certain frameworks that allow estimation involving integration via the Laplace approximation and related approaches. One of TMB's features is a built-in Laplace approximation engine, which allows any specified parameter vector to be modeled as a Gaussian latent variable. In particular it will allow us to model the transmission rate as evolving according to an autocorrelated process, which is a key element of realism. (Similar approaches are used, for example, in @asher_forecasting_2018 [which was the most accurate model in the 2018 RAPIDD Ebola forecasting challenge] and in @davies_estimated_2021 [estimating relative spread rates of COVID variants of concern].)

(Fixme: how close could we get to what we want with penalization, without implementing Laplace?)

Finally, using TMB will in principle allow an extension to estimating the model using Hamiltonian Monte Carlo with the tmbstan R package [@monnahan_no-u-turn_2018] as an interface to the Stan language [@li_fitting_2018; @chatzilena_contemporary_2019].

Why not rewrite the model entirely in C++?

Why not use a different R/C++ interface?

The Rcpp interface is more widely used than TMB, more mature, and allows use of a broader set of C++ standard library tools. However, it does not provide easy access to the Laplace approximation or Stan HMC engine.

Basic design

(Include flow diagram/call graph here.)

Goal is to expand TMB component as far up the chain as possible, but leaving complex matching steps (rate matrix construction; construction of indices for condensing multiple state variables into single observed values; construction of data vectors for comparison with simulated dynamics; construction of date-breakpoint models or model matrices for general time variation (see below) in R, to be passed as data elements to TMB.

Basic design of MacPan works by constructing a per capita flow rate matrix: at each step,



bbolker/McMasterPandemic documentation built on Aug. 25, 2024, 6:35 p.m.