In epiforecasts/eval-germany-sp-nowcasting: Evaluating Semi-Parametric Nowcasts of COVID-19 Hospital Admissions in Germany

knitr::opts_chunk$set(
  echo = FALSE, eval = TRUE,
  fig.width = 9, fig.height = 9,
  dpi = 320,
  fig.path = "figures/"
)

Authors

S. Abbott (1), S. Funk (1)

Correspondence to: sam.abbott@lshtm.ac.uk

Affiliations

Center for the Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, London WC1E 7HT, United Kingdom

Abstract

TODO: Summarise the project in a structured abstract (background, methods, results and conclusions)

Introduction

TODO: What else has been done for nowcasting

In this work, we aim to evaluate a series of novel semi-parametric nowcasting model formulations in real-time and provide an example workflow to allow others to do similarly using German COVID-19 hospitalisations by date of positive test at the national level both overall and by age group, and at the state level. This project is part of a wider collaboration assessing a range of nowcasting methods whilst providing an ensemble nowcast of COVID-19 Hospital admissions in Germany by date of positive test. This ensemble should be used for any policy-related work rather than the nowcasts provided in this repository. See here for more on this nowcasting collaboration.

Methods

Data

TODO: Data source and processing details

Models

In the following sections we provide methodological and implementation details for the nowcasting framework implemented in epinowcast and applied here. Our approach is an extension of that proposed by Günther et al.[@gunther2021] which was itself an extension of the model proposed by Höhle and Heiden[@hohle] and implemented in the surveillance R package[@surveillance]. Compared to the model proposed in Günther et al.[@gunther2021], epinowcast adds an optional parametric assumption for the underlying delay from occurrence to report, support for jointly nowcasting multiple related datasets, a flexible formula interface allowing for the specification of a large range of models, and an efficient implementation in stan which makes use of sparse design matrices and within chain parallelisation to reduce runtimes [@stan; @cmdstanr].

Decomposition into expected final notifications and report delay components

We follow the approach of Höhle and Heiden[@hohle] and consider the distribution of notifications ($n_{gtd}$) by time of occurrence ($t$) and reporting delay ($d$) conditional on the final observed count $N_{gt}$ for each dataset ($g$) such that,

\begin{equation} N_{gt} = \sum_{d=0}^{D} n_{gtd} \end{equation}

where $D$ represents the maximum delay between time of occurrence and time of report which in theory could be infinite but in practice we set to a value in order to make the model identifiable and computationally feasible. This formulation means that $n_{gtd} \mid N_{gt}$ is multinomial with a probability vector ($p_{gtd}$) of length $D$ for each $t$ and $g$ that needs to be estimated at the same time as estimating the expected number of final notifications $\mathbb{E}[N_{gt}] = \lambda_{gt}$.

An alternative approach, not explored here, is to consider each $n_{gtd}$ independently at which point the model can be defined as a standard regression with the appropriate observation model and adjustment for reporting delay (i.e it becomes a Poisson or Negative Binomial regression)[@bastos]. An implementation of this approach is available in Bastos et al.[@bastos]. More work needs to be done to evaluate which of these approaches produces more accurate nowcasts for epidemiological count data.

Expected final notifications

Here we follow the approach of Günther et al.[@gunther2021] and specify the model for expected final notifications as a first order random walk. This model can in principle be any model such as a more complex time-series approach, a gaussian process, or a mechanistic or semi-mechanistic compartmental model. Extending the flexibility of this model is an area of further work as is evaluating the benefits and tradeoffs of more complex approaches.

\begin{align} \log (\lambda_{gt}) &\sim \text{Normal}\left(\log (\lambda_{gt-1}) , \sigma^{\lambda}{g} \right) \ \log (\lambda{g0}) &\sim \text{Normal}\left(\log (N_{g0}), 1 \right) \ \sigma^{\lambda}_{g} &\sim \text{Half-Normal}\left(0, 1\right) \end{align}

Delay distribution

Again following the approach of Günther et al.[@gunther2021] we define the delay distribution ($p_{gtd}$) as a discrete time hazard model ($h_{gtd} =\text{P} \left(\text{delay}=d|\text{delay} \geq d, W_{gtd}\right)$) but we extend this model to decompose $W_{gtd}$ into 3 components: hazard derived from a parametric delay distribution ($\gamma_{gtd}$) dependent on covariates at the date of occurrence, hazard not derived from a parametric distribution ($\delta_{gtd}$) dependent on covariates at the date of occurrence, and hazard dependent on covariates referenced to the date of report ($\epsilon_{gtd}$).

For first component ($\gamma_{gtd}$) we assume that the probability of reporting $p^{\prime}{gtd}$ on a given date given follow a parametric distribution (in the baseline case a discretised log normal distribution) with the log mean and log standard deviation being defined using an intercept and arbitrary shared, reference date indexed, covariates with fixed ($\alpha{i}$) and random ($\beta_{i}$) coefficients,

\begin{align} p^{\prime}{gtd} &\sim \text{LogNormal} \left(\mu{gt}, \upsilon_{gt} \right) \ \mu_{gt} &= \mu_0 + \alpha_{\mu} X_{\gamma} + \beta_{\mu} Z_{\gamma} \ \upsilon_{gt} &= \text{exp} \left( \upsilon_0 + \alpha_{\upsilon} X_{\gamma} + \beta_{\upsilon} Z_{\gamma} \right) \end{align}

The parametric logit hazard (probability of report on a given date conditional on not already having reported) for this component of the model is then,

\begin{equation} \gamma_{gtd} = \text{logit} \left(\frac{p^{\prime}{gtd}}{\left(1 -\sum^{d-1}{d^{\prime}=0} p^{\prime}_{gtd^{\prime}} \right)} \right) \end{equation}

The non-distributional logit hazarad components for the date of occurrence and report are then again defined using an intercept and arbitrary shared covariates with fixed ($\alpha_{i}$) and random ($\beta_{i}$) coefficients.

\begin{align} \delta_{gtd} &= \mu_0 + \alpha_{\delta} X_{\delta} + \beta_{\delta} Z_{\delta} \ \epsilon_{gtd} &= \epsilon_0 + \alpha_{\epsilon} X_{\epsilon} + \beta_{\epsilon} Z_{\epsilon} \end{align}

The overall hazard for each group, occurrence time, and delay is then,

\begin{equation} \text{logit} (h_{gtd}) = \gamma_{gtd} + \delta_{gtd} + \upsilon_{gtd},\ h_{gtD} = 1 \end{equation}

where the hazard on the final day has been assumed to be 1 in order to enforce the constraint that all reported observations are reported within the specified maximum delay. The probability of report for a given delay, occurrence date, and group is then as follows,

\begin{equation} p_{gt0} = h_{gt0},\ p_{gtd} = \left(1 -\sum^{d-1}{d^{\prime}=0} p{gtd^{\prime}} \right) \times h_{gtd} \end{equation}

All ($\alpha_{i}$) and random ($\beta_{i}$) coefficients have standard normal priors by default with standard half-normal priors for pooled standard deviations.

Observation model and nowcast

Expected notifications by time of occurrence ($t$) and reporting delay can now be found by multiplying expected final notifications for each $t$ with the probability of reporting for each day of delay ($p_{gtd}$). We then assume a negative binomial observation model with a joint overdispersion parameter (with a 1 over square root standard half normal prior) and produce a nowcast of final observed notifications at each occurrence time by summing posterior estimates for each observed notification for that occurrence time.

\begin{align} n_{gtd} \mid \lambda_{gt},p_{gtd} &\sim \text{NB}(\lambda_{gt} \times p_{gtd}, \phi),\ t=1,...,T. \ \frac{1}{\phi^2} &\sim \text{Half-Normal}(0, 1) \ N_{gt} &= \sum_{d=0}^{D} n_{gtd} \end{align}

Specific model formulations

TODO: Detail specific model formulations

We explore two primary models and submit nowcasts from these models to the nowcasting hub. The first of these is fit independently to each data set by age and location. Hospitalisations are modelled using a random walk on the log scale. Reporting delays are then modelled parametrically using a lognormal distribution with the log mean and log standard deviation each modelled using a weekly random walk with a pooled standard deviation and a random effect for the day of the week (introduced on the 6th of December 2021) with public holidays assumed to be reported like Sundays. Report date effects are again modelled using a random effect for day of the week with public holidays assumed to be reported like Sundays. The second model is fit jointly to age groups but is otherwise structured in the same way as the unpooled model except that report day of the week effects and the observation overdispersion are assumed to be joint across age groups, age groups are assumed to have a random intercept for both the log mean and the log standard deviation of the reporting delay distribution, and there is no random effect for reference day of the week. We also consider a series of pooled models which sequentially include the features of our most complex model. These are: age groups are fit jointly, day of the week reporting effects, a random intercept for each age group, and a random walk by positive test week shared across age groups.

Model | Formula Reference: Fixed, Report: Fixed | Reference: Fixed, Report: Day of week | Reference: Age, Report: Day of week | Reference: Age and week, Report: Day of week | Reference: Age and week by age, Report: Day of week | Independent by age, Reference: Week, Report: Day of week | Independent by age, Reference: Week and day of week, Report: Day of week |

Implementation

The model is implemented as part of the epinowcast R package [@epinowcast] using stan and cmdstanr[@stan; @cmdstanr]. Sparse design matrices have been used for all covariates to limit the number of probability mass functions that need to be calculated. epinowcast incorporates additional functionality written in R[@R] to enable plotting nowcasts and posterior predictions, summarising nowcasts, and scoring them using scoringutils[@scoringutils]. All functionality is modular allowing users to extend and alter the underlying model whilst continuing to use the package framework.

Evaluation

TODO: Model evaluation. What targets. What scoring measures. What aggregation TODO: Evaluation of performance for submitted models vs others submitted to forecasting hub. TODO: Add links to nowcasting hub and evaluation against those models.

We evaluate these models first visually across a range of nowcasting dates and then quantitatively using proper scoring rules [@scoringutils] on both the natural and log scales (corresponding to absolute and relative performance) aggregating scores first across all targets and then stratifying in turn by age group, nowcast horizon, date of postive test, and date of report. We also explore other aspects of our models performance by highlighting models that have problematic fitting diagnostics and summarising the estimation time for each model. We provide a report of this evaluation that is updated in real-time as new data and nowcasts become available.

Reproducibility

All analysis was carried out using R version 4.1.2 [@R]. The analysis pipeline described here is available as a targets workflow [@targets] along with an archive of all interim results generated using gittargets [@gittargets] and stored using piggyback [@piggyback]. A Dockerfile, and prebuilt archived image, has been made available with the code to enhance reproducibility [@Boettiger:2015dw].

Resullts

This project is still underway and so results are not finalised. Please see our real-time evaluation report for our initial findings.

Dicussion

TODO: Summarise project TODO: Discussion strengths and limitations of the evaluation TODO: Compare approach evaluated here to the literature TODO: Discuss further work TODO Summarise conclusions.

Software availability

Source code is available from: https://github.com/epiforecasts/eval-germany-sp-nowcasting/

License: MIT

Models are implemented in the epinowcast R package. This is available from: https://github.com/epiforecasts/epinowcast/

Data availability

Input data and summarised results are available from: https://github.com/epiforecasts/eval-germany-sp-nowcasting/tree/main/data

Data from all interim stages of the analysis is from: https://github.com/epiforecasts/eval-germany-sp-nowcasting/releases/tag/latest

License: MIT

Grant information

This work was supported by the Wellcome Trust through a Wellcome Senior Research Fellowship to SF [210758].

Competing interests

There are no competing interests.

References

epiforecasts/eval-germany-sp-nowcasting documentation built on July 7, 2022, 8:56 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

epiforecasts/eval-germany-sp-nowcasting
Evaluating Semi-Parametric Nowcasts of COVID-19 Hospital Admissions in Germany

In epiforecasts/eval-germany-sp-nowcasting: Evaluating Semi-Parametric Nowcasts of COVID-19 Hospital Admissions in Germany

Authors

Affiliations

Abstract

Introduction

Methods

Data

Models

Decomposition into expected final notifications and report delay components

Expected final notifications

Delay distribution

Observation model and nowcast

Specific model formulations

Implementation

Evaluation

Reproducibility

Resullts

Dicussion

References

R Package Documentation

Browse R Packages

We want your feedback!

epiforecasts/eval-germany-sp-nowcasting Evaluating Semi-Parametric Nowcasts of COVID-19 Hospital Admissions in Germany

In epiforecasts/eval-germany-sp-nowcasting: Evaluating Semi-Parametric Nowcasts of COVID-19 Hospital Admissions in Germany

Authors

Affiliations

Abstract

Introduction

Methods

Data

Models

Decomposition into expected final notifications and report delay components

Expected final notifications

Delay distribution

Observation model and nowcast

Specific model formulations

Implementation

Evaluation

Reproducibility

Resullts

Dicussion

References

R Package Documentation

Browse R Packages

We want your feedback!

epiforecasts/eval-germany-sp-nowcasting
Evaluating Semi-Parametric Nowcasts of COVID-19 Hospital Admissions in Germany