Calculating Weighted Interval Score for Nowcast Models"
In NobBS: Nowcasting by Bayesian Smoothing

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
if (utils::packageVersion("scoringutils") < "2.0.0") {
    stop("The 'scoringutils' package version 2.0.0 or higher is required. Please update it.")
}
if (!requireNamespace("ggplot2", quietly = TRUE)) {
    stop("The vignette requires the 'ggplot2' package. Please install it to build the vignette.")
}

Introduction

The Weighted Interval Score (WIS) is a probabilistic measure for evaluating forecast accuracy, balancing three key aspects of forecast quality:

Interval sharpness – how narrow the prediction intervals are (favoring more precise forecasts).
Calibration – how well the predicted intervals reflect the actual variability in the data.
Coverage – the proportion of observed outcomes that fall within the predicted intervals.

WIS penalizes wide intervals or those that fail to include the observed outcome, while rewarding narrow, well-calibrated intervals. This makes WIS an interpretable and robust metric for assessing both forecast accuracy and uncertainty.

As of version 1.1.0, the NobBS package can explicitly return quantile estimates when calling the NobBS and NobBS.strat functions, which can be used to calculate the WIS for the given nowcast. This vignette demonstrates how to calculate and compare the WIS for different nowcasting models using quantile estimates generated by the NobBS package, in combination with the scoringutils package (version 2.0.0 or later).

Nowcasting Scenarios

To illustrate this process, we run two nowcasting models — a Poisson model and a negative binomial model — for five nowcasting dates, using the dengue dataset provided in the NobBS package. To obtain quantile estimates we set the desired quantiles in the new quantiles parameter within the specs list given in the call to the NobBS and NobBS.strat functions. By default, NobBS returns estimates for five quantiles (0.025, 0.25, 0.5, 0.75, 0.975), representing the median along with the 50% and 95% prediction intervals. Here, we use a more detailed set of 23 quantile values to better capture the full predictive distribution, ensuring a more granular assessment of forecast uncertainty and improving the accuracy of the WIS calculation:

library(NobBS)
library(dplyr)
library(ggplot2)
library(scoringutils)

data(denguedat)

# Helper function to run nowcasting
run_nowcast <- function(data, now, win_size, specs) {
  NobBS(data, now, units = "1 week", onset_date = "onset_week",
        report_date = "report_week", moving_window = win_size, specs = specs)
}

quantiles <- c(0.025,seq(0.05,0.95,0.05),0.975)
q_len <- length(quantiles)
q_cols <- paste0('q_',quantiles)

# Poisson model specs
specs_poisson <- list(
  dist=c("Poisson"),
  quantiles=quantiles
)

#NB model specs
specs_nb = list(
  dist=c("NB"),
  quantiles=quantiles
)

# Initialize lists to store nowcasts
nowcasts_pois <- list()   # Nowcasts using poisson model
nowcasts_nb <- list()     # Nowcasts using negtive-binomial model

test_dates <- seq(as.Date("1994-08-29"),as.Date("1994-09-26"),by=7)

win_size <- 8

for (t in seq_along(test_dates)) {
  now <- test_dates[t]
  denguedat1 <- denguedat[denguedat$onset_week<=now,]
  nowcasts_pois[[t]] <- run_nowcast(denguedat1, now, win_size, specs_poisson)
  nowcasts_nb[[t]] <- run_nowcast(denguedat1, now, win_size, specs_nb)
}

Comparing the Nowcast Estimates of the Poisson and negative binomial Models

Now let us plot and compare the estimates obtained from each of these nowcasts:

plot_estimates <- function(nowcast1, nowcast2, cases_per_date, now) {
  # Ensure input data is valid
  if (is.null(nowcast1$estimates) || is.null(nowcast2$estimates)) {
    stop("Nowcast estimates are missing.")
  }

  # Extract estimates and credible intervals for nowcast without DoW effect
  onset_dates1 <- nowcast1$estimates$onset_date
  estimates1 <- nowcast1$estimates$estimate
  lower1 <- nowcast1$estimates$q_0.025  # 2.5% quantile (lower bound of 95% CI)
  upper1 <- nowcast1$estimates$q_0.975  # 97.5% quantile (upper bound of 95% CI)

  # Extract estimates and credible intervals for nowcast with DoW effect
  onset_dates2 <- nowcast2$estimates$onset_date
  estimates2 <- nowcast2$estimates$estimate
  lower2 <- nowcast2$estimates$q_0.025
  upper2 <- nowcast2$estimates$q_0.975

  # Extract eventual case counts
  case_dates <- cases_per_date$onset_week
  case_counts <- cases_per_date$count

  # Calculate plot range
  min_val <- min(c(lower1, lower2, case_counts), na.rm = TRUE)
  max_val <- max(c(upper1, upper2, case_counts), na.rm = TRUE) * 1.35

  # Create the plot
  plot(
    onset_dates1, estimates1, col = 'blue', type = 'l',
    xlab = 'Onset Date', ylab = 'Cases',
    ylim = c(min_val, max_val), lwd = 2,
    main = paste0('Incidence Estimates for ', now)
  )
  lines(onset_dates2, estimates2, col = 'red', lwd = 2)

  # Add 95% PI shaded regions for both nowcasts
  polygon(c(onset_dates1, rev(onset_dates1)), c(lower1, rev(upper1)), 
          col = rgb(0, 0, 1, 0.2), border = NA) 
  polygon(c(onset_dates2, rev(onset_dates2)), c(lower2, rev(upper2)), 
          col = rgb(1, 0, 0, 0.2), border = NA) 

  # Add true case counts as points
  points(case_dates, case_counts, col = 'black', pch = 20)

  # Add a legend
  legend(
    'topleft',
    legend = c('Estimates Poisson model', 
               'Estimates negative binomial model', 
               '95% PI (Poisson model)', 
               '95% PI (negative binomial model)',
               'Eventual cases'),
    col = c('blue', 'red', rgb(0, 0, 1, 0.2), rgb(1, 0, 0, 0.2), 'black'),
    lty = c(1, 1, NA, NA, NA), lwd = c(2, 2, NA, NA, NA), 
    pch = c(NA, NA, 15, 15, 20),
    pt.cex = c(NA, NA, 1.5, 1.5, 1), 
    cex = 0.9
  )
}

cases_per_week <- denguedat %>% group_by(onset_week) %>% summarize(count=n())

for (t in seq_along(test_dates)) {

  now <- test_dates[t]
  nowcast_pois <- nowcasts_pois[[t]]
  nowcast_nb <- nowcasts_nb[[t]]
  cases_per_week1 <- cases_per_week[cases_per_week$onset_week <= now, ]
  plot_estimates(nowcast_pois, nowcast_nb, cases_per_week1, now)
}

As seen in these plots, the mean estimates of the two models are very similar, but the negative binomial model has wider prediction intervals, which appear to better capture the actual number of eventual cases.

Now, let us calculate the WIS for these nowcasting estimates. To do this, we first need to extract the quantile estimates from the nowcast results and use them to construct a data frame compatible with the scoringutils package.

Extracting Quantile Estimates

The quantile estimates can be found in the estimates dataframe within the list returned by the NobBS and NobBS.strat functions. The following code iterates over the nowcasting results, extracts the quantile estimates, and combines them with the observed case counts into a unified data frame. From each nowcast we extract the estimates for three horizons: 0 (the current week), -1 (the previous week) and -2 (two weeks ago).

data <- data.frame(onset_week=as.Date(character()),
                   now=as.Date(character()),
                   horizon=numeric(),
                   quantile_level=numeric(),
                   predicted=numeric(),
                   observed=numeric(),
                   model=character())

cases_per_week <- denguedat %>% group_by(onset_week) %>% summarize(count=n())

horizons <- c(-2,-1,0)

for (t in seq_along(test_dates)) {
  now <- test_dates[t]
  nowcast_pois <- nowcasts_pois[[t]]
  nowcast_nb <- nowcasts_nb[[t]]
  for(h in horizons) {
    date <- now + h*7
    true_value <- cases_per_week[cases_per_week$onset_week==date,]$count
    q_est_pois <- unname(unlist(nowcast_pois$estimates[nowcast_pois$estimates$onset_date==date,q_cols]))
    q_est_nb <- unname(unlist(nowcast_nb$estimates[nowcast_nb$estimates$onset_date==date,q_cols]))
    data_est <- data.frame(onset_week=rep(date,q_len*2),
                           now=rep(now,q_len*2),
                           horizon=rep(h,q_len*2),
                           quantile_level=rep(quantiles,2),
                           predicted=c(q_est_pois,q_est_nb),
                           observed=rep(true_value,q_len*2),
                           model=c(rep('Pois_model',q_len),rep('NB_model',q_len)))
    data <- rbind(data,data_est)
  }
}

print(head(data))
print(tail(data))

To help interpret this table, consider the first row as an example. The 0.025 quantile level means that 131 cases was the lower bound of the 95% prediction interval for the August 15, 1994 onset week, forecasted on August 29, 1994 (horizon = -2).Since the actual observed value was 135 cases, this indicates that the forecasted interval appropriately captured the lower tail of the distribution.

Calculating and Plotting the Weighted Interval Score

Next, we utilize the scoringutils package to compute the WIS for the Poisson and negative binomial models using the constructed data frame.

nowcasts <- data %>%
          scoringutils::as_forecast_quantile()
scores <- scoringutils::score(nowcasts, 
             get_metrics(nowcasts,select=c("wis","overprediction","underprediction","dispersion")))

print(head(scores))

To help interpret this table, consider the first row as an example. For the onset week of August 15, 1994, using the Poisson model, the nowcast made on August 29, 1994 (horizon = -2) resulted in a WIS of 1.259. The decomposition of WIS shows that 0.476 of the score came from overprediction, 0 from underprediction, and 0.783 from dispersion. This suggests that the model slightly overestimated case counts but had a reasonable spread in its prediction intervals.

We can now summarize the results of the models according to the horizon:

scores_per_horizon <- scores %>%
          summarise_scores(by = c("horizon", "model"))

print(scores_per_horizon)
plot_wis(scores_per_horizon) + 
  facet_wrap(~ horizon, scales ='free') +
  ggtitle('WIS per Horizon')

As we have seen above, the negative binomial model gives wider estimates compared to the Poisson model, which reflects in higher dispersion scores. However, the negative binomial model has better coverage of the true data compared to the Poisson model which reflects in lower overprediction and underprediction scores. Overall, the WIS (which is the sum of these three components) of the negative binomial model is lower (better) than the WIS of the Poisson model, for all three horizons.

We can also compare the model results according to another parameter - like the nowcasting date (the date on which the nowcast was performed):

scores_per_now_date <- scores %>%
          summarise_scores(by = c("now", "model"))

print(scores_per_now_date)
plot_wis(scores_per_now_date) + 
  facet_wrap(~ now, scales ='free')  +
  ggtitle('WIS per Nowcasting Date')

We again see that in this instance, for each of these 5 dates, the WIS of the negative binomial model is lower (better) than the WIS of the Poisson model. This is a common finding because surveillance data typically are over-dispersed.

Conclusion

The WIS provides an effective framework for assessing the accuracy and uncertainty of probabilistic forecasts, making it an important tool for evaluating nowcasting models. This vignette demonstrated how version 1.1.0 of the NobBS package integrates with the scoringutils package to evaluate forecast accuracy using WIS.