hhh4_simulate_scores: Proper Scoring Rules for Simulations from 'hhh4' Models
In surveillance: Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena

hhh4_simulate_scores

R Documentation

Proper Scoring Rules for Simulations from `hhh4` Models

Description

Calculate proper scoring rules based on simulated predictive distributions.

Usage

## S3 method for class 'hhh4sims'
scores(x, which = "rps", units = NULL, ..., drop = TRUE)
## S3 method for class 'hhh4simslist'
scores(x, ...)

Arguments

`x`	an object of class `"hhh4sims"` (as resulting from the `simulate`-method for `"hhh4"` models if `simplify = TRUE` was set), or an `"hhh4simslist"`, i.e., a list of such simulations potentially obtained from different model fits (using the same simulation period).
`which`	a character vector indicating which proper scoring rules to compute. By default, only the ranked probability score (`"rps"`) is calculated. Other options include `"logs"` and `"dss"`.
`units`	if non-`NULL`, an integer or character vector indexing the columns of `x` for which to compute the scores.
`drop`	a logical indicating if univariate dimensions should be dropped (the default).
`...`	unused (argument of the generic).

Details

This implementation can only compute univariate scores, i.e., independently for each time point.

The logarithmic score is badly estimated if the domain is large and there are not enough samples to cover the underlying distribution in enough detail (the score becomes infinite when an observed value does not occur in the samples). An alternative is to use kernel density estimation as implemented in the R package scoringRules.

Author(s)

Sebastian Meyer

Examples

data("salmAllOnset")

## fit a hhh4 model to the first 13 years
salmModel <- list(end = list(f = addSeason2formula(~1 + t)),
                  ar = list(f = ~1), family = "NegBin1", subset = 2:678)
salmFit <- hhh4(salmAllOnset, salmModel)

## simulate the next 20 weeks ahead (with very small 'nsim' for speed)
salmSims <- simulate(salmFit, nsim = 500, seed = 3, subset = 678 + seq_len(20),
                     y.start = observed(salmAllOnset)[678,])
if (requireNamespace("fanplot"))
    plot(salmSims, "fan")


### calculate scores at each time point

## using empirical distribution of simulated counts as forecast distribution
scores(salmSims, which = c("rps", "logs", "dss"))
## observed count sometimes not covered by simulations -> infinite log-score
## => for a more detailed forecast, either considerably increase 'nsim', or:

## 1. use continuous density() of simulated counts as forecast distribution
fi <- apply(salmSims, 1, function (x) approxfun(density(x)))
logs_kde <- mapply(function (f, y) -log(f(y)),
                   f = fi, y = observed(attr(salmSims,"stsObserved")))
cbind("empirical" = scores(salmSims, "logs"), "density" = logs_kde)
## a similar KDE approach is implemented in scoringRules::logs_sample()

## 2. average conditional predictive NegBin's of simulated trajectories,
##    currently only implemented in HIDDA.forecasting::dhhh4sims()


### produce a PIT histogram

## using empirical distribution of simulated counts as forecast distribition
pit(x = observed(attr(salmSims, "stsObserved")),
    pdistr = apply(salmSims, 1:2, ecdf))
## long-term forecast is badly calibrated (lower tail is unused, see fan above)
## we also get a warning for the same reason as infinite log-scores

surveillance documentation built on June 25, 2025, 5:11 p.m.