backtest: Rolling-origin backtesting of lineage frequency models
In lineagefreq: Lineage Frequency Dynamics from Genomic Surveillance Counts

backtest

R Documentation

Rolling-origin backtesting of lineage frequency models

Description

Evaluates forecast accuracy by repeatedly fitting models on historical data and comparing predictions to held-out observations. This implements the evaluation framework described in Abousamra et al. (2024).

Usage

backtest(
  data,
  engines = "mlr",
  origins = "weekly",
  horizons = c(7L, 14L, 21L, 28L),
  min_train = 42L,
  ...
)

Arguments

`data`	An lfq_data object.
`engines`	Character vector of engine names to compare. Default `"mlr"`.
`origins`	How to select forecast origins: `"weekly"` (default): one origin per unique date, starting after `min_train` days. An integer: use every Nth date as an origin. A Date vector: use these specific dates as origins.
`horizons`	Integer vector of forecast horizons in days. Default `c(7, 14, 21, 28)`.
`min_train`	Minimum training window in days. Origins earlier than `min(date) + min_train` are skipped. Default 42 (6 weeks).
`...`	Additional arguments passed to `fit_model()` (e.g., `generation_time` for the Piantham engine).

Details

Implements the rolling-origin evaluation framework described in Abousamra et al. (2024), Section 2.4. At each origin date, the model is fit on data up to that date and forecasts are compared to held-out future observations. This avoids look-ahead bias and provides an honest assessment of real-time forecast accuracy.

Value

An lfq_backtest object (tibble subclass) with columns:

origin_date: Date used as the training cutoff.
target_date: Date being predicted.
horizon: Forecast horizon in days.
engine: Engine name.
lineage: Lineage name.
predicted: Predicted frequency (median).
lower: Lower prediction bound.
upper: Upper prediction bound.
observed: Observed frequency at target_date.

References

Abousamra E, Figgins M, Bedford T (2024). Fitness models provide accurate short-term forecasts of SARS-CoV-2 variant frequency. PLoS Computational Biology, 20(9):e1012443. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1371/journal.pcbi.1012443")}

Examples


sim <- simulate_dynamics(n_lineages = 3,
  advantages = c("A" = 1.2, "B" = 0.8),
  n_timepoints = 20, seed = 1)
bt <- backtest(sim, engines = "mlr",
  horizons = c(7, 14), min_train = 42)
bt

lineagefreq documentation built on April 3, 2026, 9:09 a.m.