asmodee: Automatic Selection of Models Outlier DEtection for Epidemics...
In reconhub/epichange: Detect Changes in Temporal Trends

Description Usage Arguments Details Value Author(s) Examples

This function implements an algorithm for epidemic time series analysis in aim to detect recent deviation from the trend followed by the data. Data is first partitioned into 'recent' data, using the last k observations as supplementary individuals, and older data used to fit the trend. Trend-fitting is done by fitting a series of user-specified models for the time series, with different methods for selecting best fit (see details, and the argument method). The prediction interval is then calculated for the best model, and every data point (including the training set and supplementary individuals) falling outside are classified as 'outliers'.

asmodee(data, models, ...)

## S3 method for class 'data.frame'
asmodee(
  data,
  models,
  date_index,
  alpha = 0.05,
  k = 7,
  method = evaluate_aic,
  method_args = list(),
  simulate_pi = TRUE,
  uncertain = FALSE,
  include_fitting_warnings = FALSE,
  include_prediction_warnings = TRUE,
  force_positive = FALSE,
  keep_intermediate = FALSE,
  ...
)

## S3 method for class 'incidence2'
asmodee(
  data,
  models,
  alpha = 0.05,
  k = 7,
  method = evaluate_aic,
  method_args = list(),
  simulate_pi = TRUE,
  uncertain = FALSE,
  include_fitting_warnings = FALSE,
  include_prediction_warnings = TRUE,
  force_positive = TRUE,
  keep_intermediate = FALSE,
  ...
)

`data`	A `data.frame` or a `tibble` containing the response and explanatory variables used in the `models`.
`models`	A list of `trending_model()` objects, generated by `lm_model`, `glm_model`, `glm_nb_model`, `brms_model` and similar functions (see `?trending::trending_model()`) for details.
`...`	Not currently used.
`date_index`	The name of a variable corresponding to time, quoted or not.
`alpha`	The alpha threshold to be used for the prediction interval calculation; defaults to 0.05, i.e. 95% prediction intervals are calculated.
`k`	An `integer` indicating the number of recent data points to be excluded from the trend fitting procedure. Defaults to 7.
`method`	A function used to evaluate model fit. Current choices are `evaluate_aic` (default) and `evaluate_resampling`. `evaluate_aic` uses Akaike's Information Criterion instead, which is faster but possibly less good a selecting models with the best predictive power. `evaluate_resampling` uses cross-validation and, by default, RMSE to assess model fit.
`method_args`	Optional named list of additional arguments to pass to method. Defaults to an empty list.
`simulate_pi`	A `logical` indicating if prediction intervals should be derived by bootstrap using the ciTools package, or calculated analytically. Defaults to `TRUE`.
`uncertain`	A `logical` indicating if uncertainty in the fitted parameters should be taken into account when deriving prections intervals. Only used for glm models and if simulate_pi = `FALSE`. Defaults to `FALSE`.
`include_fitting_warnings`	A `logical` indicating if results should include models that triggered warnings (but not errors), during the fitting procedure. Defaults to `FALSE`, as warnings can typically indicate lack of convergence during the parameter estimation.
`include_prediction_warnings`	A `logical` indicating if results should include models that triggered warnings (but not errors), during the prediciton stage. Defaults to `TRUE`.
`force_positive`	A `logical` indicating if prediction should be forced to be positive (or zero); can be useful when using Gaussian models for count data, to avoid negative predictions. Defaults to `FALSE` for general `data.frame` inputs, and to `TRUE` for `incidence2` objects.
`keep_intermediate`	A `logical` indicating if all output from the fitting and prediction stages should be returned. If `TRUE`, a tibble will be returned in the fitted_results position of the resulting list output. If `FALSE` (default) fitted_results will be `NULL`.

Automatic model selection is used to determine the model best fitting the training data from a list of user-provided models. First, all models are fitted to the data. Second, models are selected using the approach specified by the method argument. The default is evaluate_aic() which uses Akaike's Information Criteria to assess model fit penalised by model complexity. This approach is fast, but measures model fit rather than predictive ability. The alternative is using evaluate_resampling(), uses cross-validation (10-fold by default) and root mean squared error (RMSE) to assess model fit. This approach is likely to select models with good predictive abilities, but is computationally intensive. Also, it does not attempt to maximise the explained deviance, so selected models may have good average predictions but underestimate uncertainty.

An trendbreaker object (S3 class inheriting list), containing items which can be accessed by various accessors - see ?trendbreaker-accessors

Thibaut Jombart, Dirk Schumacher and Tim Taylor, with inputs from Michael Höhle, Mark Jit, John Edmunds, Andre Charlett, Stéphane Ghozzi

if (require(cowplot) && require(tidyverse) && require(trending)) {
  # load data
  data(nhs_pathways_covid19)

  # select last 28 days
  first_date <- max(nhs_pathways_covid19$date, na.rm = TRUE) - 28
  pathways_recent <- nhs_pathways_covid19 %>%
    filter(date >= first_date)

  # define candidate models
  models <- list(
    regression = lm_model(count ~ day),
    poisson_constant = glm_model(count ~ 1, family = "poisson"),
    negbin_time = glm_nb_model(count ~ day),
    negbin_time_weekday = glm_nb_model(count ~ day + weekday)
  )

  # analyses on all data
  counts_overall <- pathways_recent %>%
    group_by(date, day, weekday) %>%
    summarise(count = sum(count))

  # results with fixed value of 'k' (7 days)
  res_overall_k7 <- asmodee(counts_overall, models, date, k = 7)
  plot(res_overall_k7, "date")
}