We are very grateful to the reviewers for their very positive feedback and helpful suggestions for minor improvements, which we have attempted to implement.
The manuscript "Fitting epidemic models to data -- a tutorial in memory of Fred Brauer" was a pleasure to read, and will make an excellent contribution to the literature. It is well written, appropriately balances breadth and depth, introduces a new computational tool, and it and fills a sizable information gap that exists right now for mathematicians (with limited or no background in applied statistics) -- and many statisticians -- who are interested in modern approaches to fitting nonlinear dynamic models to time series data. I commend the authors for this contribution, and for their remebrance of Fred.
I did not find any major faults with the paper, but have suggested some edits below that I feel should improve the paper. The first four comments are most substantive.
I trust the authors judgement in making the above modification, but just in case it's helpful, here are my suggestions for how to more explicitly introduce the "observation model" concept: First, introduce the idea briefly with one or two sentences early in section 3. Something as simple as the following might suffice: "Here we assume our data are direct observations of one of our state variables. However, this is frequently not the case. When a more nuanced relationship defines the link between model and data, we can specify an \textbf{\textit{observation model}} that describes how our data values relate to the ODE trajectories. We will revisit this concept in the next section." Then, in section 4, mention it again in the context of constructing the likelihood function: Minor adjustments to the text preceding eq. (14) could reframe that derivation using the concept of an observation model more explicitly. With those two modifications in place, then the text leading up to the "observation" option in in fitode (see eq. (27)) could be modified to refer to the observation model as an explicit concept.
Done.
Rather than $x[]$, we now use $x_{\textrm{obs}}$ to denote observed values of the variable $x$.
Done
I should add that, for all three comments above, there is value in using notational conventions and other formalisms familiar to statisticians, where possible. This would help mathematicians who read this paper to more easily see how it relates to existing statistics literature, and likewise would make the paper more accessible to statisticians who have limited exposure to dynamic models and these techniques.
The discussion could include a bit more guidance regarding identifiability issues. For example, line 533, the parenthetical discription for "unidentifiable" might be reconsidered, and replaced with something that elaborates a bit more on the problem of (a) structural unidentifiability and (b) practical identifiability, each with their own description touching upon the fact that many ODE models (used in this context) are overparameterized and yield non-unique parameter estimates, even under ideal data assumptions. The second issue can arise for even strucuturally identifiable models which may be practically unidentifiable due to, e.g., a lack of data from certain parts of state space. A simple verbal example to illustrate this last point (if you wanted to go into that much detail) is to ask the reader to consider fitting the logistic growth model $dx/dt=rx(1-x/K)$, with known initial value $x_0$, to time series data that ends while the trajectory is only in the exponential growth phase. In that case, the parameter $r$ will be confidently estimated, but $K$ will not since all sufficiently large $K$ values will appear to give equally good fits, which reflects that the data contain no information about that steady state value, $K$. Some or all of this might be better placed in the paragraph starting at line 600, where you discuss convergence issues. Consider also adding a brief mention of identifiability issues somewhere in section 4, and please consider adding some references for dealing with the different types of identifiability issues.
Related to the above comment, on line 603, it might be useful to call this procedure the "multistart method" (as it is sometimes called) and to mention that it is not only useful for diagnosing convergence issues (where you would see dissimilar "best" parameter sets with differing likelihood values), but it's also useful for detecting identifiability issues (here, differing best parameter sets would show very similar likelihood values).
Done
Done
We have now attempted to follow BMB style for everything.
We adjusted these slightly, but this seems best to deal with at the copy-editing stage.
We have added a few references.
This manuscript introduces the software package fitode, an R-based tool developed to aid the fitting of ODE models to observed time series data, particularly for epidemiological applications. The software serves as a practical response to the historical curiosity about how dynamic models fit to empirical data, a question posed by the late mathematical biologist Fred Brauer. The manuscript not only shows the functionality of fitode through examples involving compartmental epidemic models but also provides a tutorial approach to guide readers, presumably with a background in mathematics but less familiarity with statistical methods and optimization techniques. The discussion delves into the technical details of model fitting, parameter estimation, and the challenges inherent in this domain.
We have added a paragraph on identifiability, and expanded a comment about regularization.
We have added some comments about MATLAB and Berkeley Madonna.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.