backtest: Rolling-origin backtesting of lineage frequency models

View source: R/backtest.R

backtestR Documentation

Rolling-origin backtesting of lineage frequency models

Description

Evaluates forecast accuracy by repeatedly fitting models on historical data and comparing predictions to held-out observations. This implements the evaluation framework described in Abousamra et al. (2024).

Usage

backtest(
  data,
  engines = "mlr",
  origins = "weekly",
  horizons = c(7L, 14L, 21L, 28L),
  min_train = 42L,
  ...
)

Arguments

data

An lfq_data object.

engines

Character vector of engine names to compare. Default "mlr".

origins

How to select forecast origins:

  • "weekly" (default): one origin per unique date, starting after min_train days.

  • An integer: use every Nth date as an origin.

  • A Date vector: use these specific dates as origins.

horizons

Integer vector of forecast horizons in days. Default c(7, 14, 21, 28).

min_train

Minimum training window in days. Origins earlier than min(date) + min_train are skipped. Default 42 (6 weeks).

...

Additional arguments passed to fit_model() (e.g., generation_time for the Piantham engine).

Details

Implements the rolling-origin evaluation framework described in Abousamra et al. (2024), Section 2.4. At each origin date, the model is fit on data up to that date and forecasts are compared to held-out future observations. This avoids look-ahead bias and provides an honest assessment of real-time forecast accuracy.

Value

An lfq_backtest object (tibble subclass) with columns:

origin_date

Date used as the training cutoff.

target_date

Date being predicted.

horizon

Forecast horizon in days.

engine

Engine name.

lineage

Lineage name.

predicted

Predicted frequency (median).

lower

Lower prediction bound.

upper

Upper prediction bound.

observed

Observed frequency at target_date.

References

Abousamra E, Figgins M, Bedford T (2024). Fitness models provide accurate short-term forecasts of SARS-CoV-2 variant frequency. PLoS Computational Biology, 20(9):e1012443. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1371/journal.pcbi.1012443")}

See Also

score_forecasts() to compute accuracy metrics, compare_models() to rank engines.

Examples


sim <- simulate_dynamics(n_lineages = 3,
  advantages = c("A" = 1.2, "B" = 0.8),
  n_timepoints = 20, seed = 1)
bt <- backtest(sim, engines = "mlr",
  horizons = c(7, 14), min_train = 42)
bt



lineagefreq documentation built on April 3, 2026, 9:09 a.m.