knitr::opts_chunk$set(echo = FALSE, message = FALSE, dpi = 300, warning = FALSE)
library(knitr)
library(here)
library(data.table)
library(DT)
library(janitor)
library(purrr)
library(ggplot2)

source(here("R", "utils.R"))
source(here("R", "plot.R"))
source(here("R", "get-hub-forecasts.R"))
source(here("R", "evaluation.R"))

This is a preliminary real-time comparison between the semi-parametric nowcasting method evaluated in this project and the other methods submitted to the German nowcasting hub. It is not the official prespecified evaluation the protocol for which can be found here. See our news for details of any changes to this evaluation.

Introduction

In this real time evaluation report we provide preliminary visualisations, evaluation, and exploration of our nowcasting methodology [@epinowcast] for COVID-19 hospitalisations in Germany by date of postive test compared to the other nowcasting methods submitted to the Germany nowcasting hub. For more details of our methodology and the specifics of the models shown in this report please see our project summary, our accompanying paper, and the German nowcasting hub. This report complements our real-time evaluation of models implemented using our methodology. It is updated each day (at roughly 8:00 GMT) as new data and nowcasts become available and may evolve over time. See our news section for a list of dated changes.

We first visualise current nowcasts across methods and age groups at the national level as well as overall at the subnational level. We also plot nowcasts at the date of estimation across sequential nowcasts allowing us to summarise the performance of multiple nowcasts on a single plot, again for age groups at the national level and overall at the subnational level.

To quantify comparative performance we make use of proper scoring rules [@scoringutils] on both the natural and log scales (corresponding to absolute and relative performance) to observed data reported at least 28 days ago aggregating scores first across all targets and then stratifying in turn by age group, nowcast horizon, date of postive test, date of report, and location. See Bosse et al. [@scoringutils] for information on how to interpret these scores. Unlike our real-time evaluation of our models here we evaluate performance using 7 day incidence (as this is the target of interest for the German nowcasting hub) across the complete 28 days of the nowcast and for all locations and age groups available.

The code for this report can be found here and the data that it uses can be found here (though you may find it easier to access using the data accessing code that supports this report.

Visualisation

In the following sections nowcasts are visualised for the latest estimation date and then by estimation date for all models considered. Visualising a nowcast from a single estimation date corresponds to plotting a real-time nowcast whilst plotting across estimation dates for that date is useful for understanding the performance of nowcasting models across a number of nowcasts in a concise way.

Nowcasts

Nowcasts based on the latest available data by age group on the national level and overall on the subnational level. Due to a technical issue resolved on the 5th of January 2022 the 10% quantile is excluded for all nowcasts so that they are evaluated on an even footing.

start_using_memoise(".cache")
nowcast_dates <- seq(as.Date("2021-11-22"), Sys.Date(), by = "day")
hub_nowcasts <- get_hub_forecasts(
  "KITmetricslab/hospitalization-nowcast-hub",
  dates = nowcast_dates
)
hub_nowcasts <- hub_nowcasts[quantile != 0.1]
hub_nowcasts <- format_hub_nowcasts(hub_nowcasts)

latest_seven_day_hosp <- load_obs(here("data", "observations", "seven_day.csv"))
complete_seven_day_hosp <- latest_seven_day_hosp[
  reference_date < (max(reference_date) - 28)
][]

National

plot_nowcast(
  hub_nowcasts[location == "DE"][
               nowcast_date == max(nowcast_date)] |>
  map_to_dummy_quantiles(),
  latest_seven_day_hosp[location == "DE"],
  max_delay = 28
) +
  facet_grid(vars(age_group), vars(model), scales = "free_y")

Sub-national

plot_nowcast(
  hub_nowcasts[!(location == "DE")][
                 nowcast_date == max(nowcast_date)] |>
  map_to_dummy_quantiles(),
  latest_seven_day_hosp[!(location == "DE")][age_group == "00+"],
  max_delay = 28
) +
  facet_grid(vars(location), vars(model), scales = "free_y")

Nowcasts at estimation date

Nowcast estimates at the date of estimation for sequential nowcasts by age group at the national level and overall at the subnational level.

National

plot_nowcast(
  hub_nowcasts[location == "DE"][
               horizon == 0] |>
  map_to_dummy_quantiles(),
  latest_seven_day_hosp[location == "DE"]
) +
  facet_grid(vars(age_group), vars(model), scales = "free_y")

Subnational

plot_nowcast(
  hub_nowcasts[!(location == "DE")][
               horizon == 0] |>
  map_to_dummy_quantiles(),
  latest_seven_day_hosp[!(location == "DE")][age_group == "00+"]
) +
  facet_grid(vars(location), vars(model), scales = "free_y")

Evaluation

In this section we evaluate and compare the performance of each nowcasting method using proper scoring rules [@scoringutils] on both the natural and the log scale to observed data reported at least 28 days ago. This corresponds to evaluating absolute and relative error. Unlike our real-time evaluation of our models here we evaluate performance using 7 day incidence (as this is the target of interest for the German nowcasting hub) across the complete 28 days of the nowcast and for all locations and age groups available. We explore overall scores as well as scores stratified by age group, by nowcast horizon, and by date of postive test, by report date, and by location. We drop the SZ-hosp_nowcast from all stratified evaluation due to its poor overall performance compared to other approaches and because this masks other meaningful between model differences. We also drop the first nowcast (from the 24th of November 2021) from the RKI-weekly_report model from all stratified evaluatioon as this was the only nowcast for which estimates were made for the day of nowcast and the subsequent two days from this model, performance was poor, and this masked other meaningful between model differences.

Overall nowcast model scores

scores <- score_nowcast(hub_nowcasts, complete_seven_day_hosp)
hub_nowcasts <- hub_nowcasts[!model %in% "SZ-hosp_nowcast"]
hub_nowcasts <- hub_nowcasts[
  !(model %in% "RKI-weekly_report" & nowcast_date == "2021-11-24")
]
plot_scores(
  scores, y = interval_score, x = model, col = model,
  fill = model, group = model
) +
  facet_wrap(vars(scale), scales = "free_x") +
  labs(x = "Model") +
  theme(legend.position = "none") +
  coord_flip()

Natural scale scores (absolute)

fancy_datatable(scores[scale == "natural"][, scale := NULL])

Log scale scores (relative)

fancy_datatable(scores[scale == "log"][, scale := NULL])

Nowcast model scores by horizon

horizon_scores <- score_nowcast(
  hub_nowcasts, complete_seven_day_hosp, by = c("horizon", "model")
)
plot_scores(
  horizon_scores, y = interval_score, x = horizon, col = model, fill = model,
  group = model
) +
  facet_wrap(vars(scale), scales = "free_y") +
  labs(x = "Nowcast horizon (with 0 being the date of nowcast)")

Natural scale scores (absolute)

fancy_datatable(horizon_scores[scale == "natural"][, scale := NULL])

Log scale scores (relative)

fancy_datatable(horizon_scores[scale == "log"][, scale := NULL])

Nowcast model scores by age group

age_group_scores <- score_nowcast(
  hub_nowcasts, complete_seven_day_hosp, by = c("age_group", "model")
)
plot_scores(
  age_group_scores, y = interval_score, x = age_group, col = model,
  fill = model, group = model
) +
  facet_wrap(vars(scale), scales = "free_y") +
  labs(x = "Age group")

Natural scale scores (absolute)

fancy_datatable(age_group_scores[scale == "natural"][, scale := NULL])

Log scale scores (relative)

fancy_datatable(age_group_scores[scale == "log"][, scale := NULL])

Nowcast model scores by date of positive test

reference_date_scores <- score_nowcast(
  hub_nowcasts, complete_seven_day_hosp, by = c("reference_date", "model")
)
plot_scores(
  reference_date_scores, y = interval_score, x = reference_date, col = model,
  fill = model, group = model
) +
  facet_wrap(vars(scale), scales = "free_y") +
  labs(x = "Reference date")

Natural scale scores (absolute)

fancy_datatable(reference_date_scores[scale == "natural"][, scale := NULL])

Log scale scores (relative)

fancy_datatable(reference_date_scores[scale == "log"][, scale := NULL])

Nowcast model scores by report date

report_date_scores <- score_nowcast(
  hub_nowcasts, complete_seven_day_hosp, by = c("nowcast_date", "model")
)
plot_scores(
  report_date_scores, y = interval_score, x = nowcast_date, col = model,
  fill = model, group = model
) +
  facet_wrap(vars(scale), scales = "free_y") +
  labs(x = "Report date")

Natural scale scores (absolute)

fancy_datatable(report_date_scores[scale == "natural"][, scale := NULL])

Log scale scores (relative)

fancy_datatable(report_date_scores[scale == "log"][, scale := NULL])

Nowcast model scores by location

location_scores <- score_nowcast(
  hub_nowcasts, complete_seven_day_hosp, by = c("location", "model")
)
plot_scores(
  location_scores, y = interval_score, x = location, col = model,
  fill = model, group = model
) +
  facet_wrap(vars(scale), scales = "free_x") +
  labs(x = "Location") +
  coord_flip()

Natural scale scores (absolute)

fancy_datatable(location_scores[scale == "natural"][, scale := NULL])

Log scale scores (relative)

fancy_datatable(location_scores[scale == "log"][, scale := NULL])

References



epiforecasts/eval-germany-sp-nowcasting documentation built on July 7, 2022, 8:56 p.m.