asmodee: Automatic Selection of Models Outlier DEtection for Epidemics...

Description Usage Arguments Details Value Author(s) Examples

View source: R/asmodee.R

Description

This function implements an algorithm for epidemic time series analysis in aim to detect recent deviation from the trend followed by the data. Data is first partitioned into 'recent' data, using the last k observations as supplementary individuals, and older data used to fit the trend. Trend-fitting is done by fitting a series of user-specified models for the time series, with different methods for selecting best fit (see details, and the argument method). The prediction interval is then calculated for the best model, and every data point (including the training set and supplementary individuals) falling outside are classified as 'outliers'.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
asmodee(data, models, ...)

## S3 method for class 'data.frame'
asmodee(
  data,
  models,
  date_index,
  alpha = 0.05,
  k = 7,
  method = evaluate_aic,
  method_args = list(),
  simulate_pi = TRUE,
  uncertain = FALSE,
  include_fitting_warnings = FALSE,
  include_prediction_warnings = TRUE,
  force_positive = FALSE,
  keep_intermediate = FALSE,
  ...
)

## S3 method for class 'incidence2'
asmodee(
  data,
  models,
  alpha = 0.05,
  k = 7,
  method = evaluate_aic,
  method_args = list(),
  simulate_pi = TRUE,
  uncertain = FALSE,
  include_fitting_warnings = FALSE,
  include_prediction_warnings = TRUE,
  force_positive = TRUE,
  keep_intermediate = FALSE,
  ...
)

Arguments

data

A data.frame or a tibble containing the response and explanatory variables used in the models.

models

A list of trending_model() objects, generated by lm_model, glm_model, glm_nb_model, brms_model and similar functions (see ?trending::trending_model()) for details.

...

Not currently used.

date_index

The name of a variable corresponding to time, quoted or not.

alpha

The alpha threshold to be used for the prediction interval calculation; defaults to 0.05, i.e. 95% prediction intervals are calculated.

k

An integer indicating the number of recent data points to be excluded from the trend fitting procedure. Defaults to 7.

method

A function used to evaluate model fit. Current choices are evaluate_aic (default) and evaluate_resampling. evaluate_aic uses Akaike's Information Criterion instead, which is faster but possibly less good a selecting models with the best predictive power. evaluate_resampling uses cross-validation and, by default, RMSE to assess model fit.

method_args

Optional named list of additional arguments to pass to method. Defaults to an empty list.

simulate_pi

A logical indicating if prediction intervals should be derived by bootstrap using the ciTools package, or calculated analytically. Defaults to TRUE.

uncertain

A logical indicating if uncertainty in the fitted parameters should be taken into account when deriving prections intervals. Only used for glm models and if simulate_pi = FALSE. Defaults to FALSE.

include_fitting_warnings

A logical indicating if results should include models that triggered warnings (but not errors), during the fitting procedure. Defaults to FALSE, as warnings can typically indicate lack of convergence during the parameter estimation.

include_prediction_warnings

A logical indicating if results should include models that triggered warnings (but not errors), during the prediciton stage. Defaults to TRUE.

force_positive

A logical indicating if prediction should be forced to be positive (or zero); can be useful when using Gaussian models for count data, to avoid negative predictions. Defaults to FALSE for general data.frame inputs, and to TRUE for incidence2 objects.

keep_intermediate

A logical indicating if all output from the fitting and prediction stages should be returned. If TRUE, a tibble will be returned in the fitted_results position of the resulting list output. If FALSE (default) fitted_results will be NULL.

Details

Automatic model selection is used to determine the model best fitting the training data from a list of user-provided models. First, all models are fitted to the data. Second, models are selected using the approach specified by the method argument. The default is evaluate_aic() which uses Akaike's Information Criteria to assess model fit penalised by model complexity. This approach is fast, but measures model fit rather than predictive ability. The alternative is using evaluate_resampling(), uses cross-validation (10-fold by default) and root mean squared error (RMSE) to assess model fit. This approach is likely to select models with good predictive abilities, but is computationally intensive. Also, it does not attempt to maximise the explained deviance, so selected models may have good average predictions but underestimate uncertainty.

Value

An trendbreaker object (S3 class inheriting list), containing items which can be accessed by various accessors - see ?trendbreaker-accessors

Author(s)

Thibaut Jombart, Dirk Schumacher and Tim Taylor, with inputs from Michael Höhle, Mark Jit, John Edmunds, Andre Charlett, Stéphane Ghozzi

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
if (require(cowplot) && require(tidyverse) && require(trending)) {
  # load data
  data(nhs_pathways_covid19)

  # select last 28 days
  first_date <- max(nhs_pathways_covid19$date, na.rm = TRUE) - 28
  pathways_recent <- nhs_pathways_covid19 %>%
    filter(date >= first_date)

  # define candidate models
  models <- list(
    regression = lm_model(count ~ day),
    poisson_constant = glm_model(count ~ 1, family = "poisson"),
    negbin_time = glm_nb_model(count ~ day),
    negbin_time_weekday = glm_nb_model(count ~ day + weekday)
  )

  # analyses on all data
  counts_overall <- pathways_recent %>%
    group_by(date, day, weekday) %>%
    summarise(count = sum(count))

  # results with fixed value of 'k' (7 days)
  res_overall_k7 <- asmodee(counts_overall, models, date, k = 7)
  plot(res_overall_k7, "date")
}

reconhub/epichange documentation built on April 28, 2021, 2 p.m.