asmodee: Automatic Selection of Models Outlier DEtection for Epidemics...

Description Usage Arguments Details Value Author(s) Examples

View source: R/asmodee.R

Description

This function implements an algorithm for epidemic time series analysis in aim to detect recent deviation from the trend followed by the data. Data is first partitioned into 'recent' data, using the last k observations as supplementary individuals, and older data used to fit the trend. Trend-fitting is done by fitting a series of user-specified models for the time series, with different methods for selecting best fit (see details, and the argument method). The prediction interval is then calculated for the best model, and every data point (including the training set and supplementary individuals) falling outside are classified as 'outliers'. The value of k can be fixed by the user, or automatically selected to minimise outliers in the training period and maximise and the detection of outliers in the recent period.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
asmodee(data, models, ...)

## S3 method for class 'data.frame'
asmodee(
  data,
  models,
  date_index,
  alpha = 0.05,
  max_k = 7,
  fixed_k = NULL,
  method = trendeval::evaluate_aic,
  simulate_pi = TRUE,
  uncertain = FALSE,
  include_warnings = FALSE,
  quiet = FALSE,
  force_positive = FALSE,
  ...
)

## S3 method for class 'incidence2'
asmodee(
  data,
  models,
  alpha = 0.05,
  max_k = 7,
  fixed_k = NULL,
  method = trendeval::evaluate_aic,
  simulate_pi = TRUE,
  uncertain = FALSE,
  include_warnings = FALSE,
  force_positive = TRUE,
  n_cores = 1,
  ...
)

Arguments

data

A data.frame or a tibble containing the response and explanatory variables used in the models.

models

A list of trending_model() objects, generated by lm_model, glm_model, glm_nb_model, brms_model and similar functions (see ?trending::trending_model()) for details.

...

Further arguments passed to method.

date_index

The name of a variable corresponding to time, quoted or not.

alpha

The alpha threshold to be used for the prediction interval calculation; defaults to 0.05, i.e. 95% prediction intervals are calculated.

max_k

An integer indicating the maximum number of recent data points to be excluded from the trend fitting procedure. By default, ASMODEE will look for a changepoint within this recent time period, after which data no longer fit the previous trend. Larger values will require more computation from the method. Only used if fixed_k is NULL.

fixed_k

An optional integer indicating the number of recent data points to be excluded from the trend fitting procedure. Defaults to NULL, in which case ASMODEE detects k automatically, at the expense of computational time.

method

A function used to evaluate model fit. Current choices are evaluate_aic (default) and evaluate_resampling. evaluate_aic uses Akaike's Information Criterion instead, which is faster but possibly less good a selecting models with the best predictive power. evaluate_resampling uses cross-validation and, by default, RMSE to assess model fit.

simulate_pi

Should the ciTools package be used to simulate prediction intervals for glm models. Defaults to TRUE.

uncertain

Only used for glm models. If FALSE uncertainty in the fitted parameters is ignored when generating the prediction intervals. Defaults to FALSE.

include_warnings

Include results in output that triggered warnings but not errors. Defaults to FALSE.

quiet

A logical indicating if warnings and messages should be suppressed (TRUE) or use (FALSE, default).

force_positive

A logical indicating if prediction should be forced to be positive (or zero); can be useful when using Gaussian models for count data, to censore confidence or prediction intervals and avoid negative predictions. Defaults to FALSE for general data.frame inputs, and to TRUE for incidence2 objects.

n_cores

An integer indicating the number of cores to be used; if greater than 1, then asmodee will be run in parallel across the number of requested cores; defaults to 1.

Details

Automatic model selection is used to determine the model best fitting the training data from a list of user-provided models. First, all models are fitted to the data. Second, models are selected using the approach specified by the method argument. The default, evaluate_resampling, uses cross-validation (10-fold by default) and root mean squared error (RMSE) to assess model fit. This approach is likely to select models with good predictive abilities, but is computationally intensive. The alternative is using evaluate_aic, which uses Akaike's Information Criteria to assess model fit penalised by model complexity. This approach is fast, but only measures model fit rather than predictive ability.

Value

An trendbreaker object (S3 class inheriting list), containing items which can be accessed by various accessors - see ?trendbreaker-accessors

Author(s)

Thibaut Jombart, Dirk Schumacher and Tim Taylor, with inputs from Michael Höhle, Mark Jit, John Edmunds, Andre Charlett, Stéphane Ghozzi

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
if (require(cowplot) && require(tidyverse) && require(trending)) {
  ## load data
  data(nhs_pathways_covid19)

  ## select last 28 days
  first_date <- max(nhs_pathways_covid19$date, na.rm = TRUE) - 28
  pathways_recent <- nhs_pathways_covid19 %>%
    filter(date >= first_date)

  ## define candidate models
  models <- list(
    regression = lm_model(count ~ day),
    poisson_constant = glm_model(count ~ 1, family = "poisson"),
    negbin_time = glm_nb_model(count ~ day),
    negbin_time_weekday = glm_nb_model(count ~ day + weekday)
  )

  ## analyses on all data
  counts_overall <- pathways_recent %>%
    group_by(date, day, weekday) %>%
    summarise(count = sum(count))

  ## results with automated detection of 'k'
  res_overall <- asmodee(counts_overall,
                         models,
                         "date",
                         method = evaluate_aic)
  res_overall
  plot(res_overall, "date")

  ## results with fixed value of 'k' (7 days)
  res_overall_k7 <- asmodee(counts_overall, models, date, fixed_k = 7)
  plot(res_overall_k7, "date")

}

reconhub/epichange documentation built on April 8, 2021, 3:45 a.m.