knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(aedseo)
The aedseo
package performs automated and early detection of seasonal epidemic onsets and estimates
the breakpoints for burden levels from time series data stratified by season.
The seasonal onset (seasonal_onset()
) estimates growth rates for consecutive time intervals and calculates the sum of cases.
The burden levels (seasonal_burden_levels()
) use the previous seasons to estimate the burden levels of the current season.
The algorithm allows for surveillance of pathogens, by alarming when the observations have significant growth in
the selected time interval and based on the disease-specific threshold, while also evaluating the burden of current
observations based on previous seasons.
To apply the aedseo
algorithm, data needs to be transformed into a tsd
object.
If you have your own data, the to_time_series()
function can be used with the arguments: observation
, time
, time_interval
.
In the following section, the application of the algorithm is shown with simulated data created with the generate_seasonal_data()
function.
More information about the function can be found in the vignette("generate_seasonal_wave")
withr::local_seed(222) # Construct an 'tsd' object with time series data tsd_data <- generate_seasonal_data( years = 5, start_date = as.Date("2020-10-18"), trend_rate = 1.002, noise_overdispersion = 5, relative_epidemic_concentration = 3, time_interval = "week" ) tsd_data <- tsd_data |> dplyr::filter(time <= "2025-02-01")
In the following figure simulated data (solid circles) are visualised as individual observations. The solid line connects these points, representing the underlying mean trend over three years of weekly data.
plot(tsd_data)
Respiratory viruses can circulate in different seasons based on the location. In the nordic hemisphere they mostly circulate in the fall and winter seasons, hence surveillance is intensified from week 40 to week 20 in the following year. To include all data, the season in the example is set from week 21 to week 20 in the following year. In this example burden levels and seasonal onset will be estimated for season 2024/2025.
When observations are low there is a risk that randomness will result in significant growth estimates in isolated periods. To increase the robustness of the method a disease-specific threshold is introduced. It should be set such that subsequent estimates of growth are likely to be significant as well. The disease-specific threshold can be determined by examining continuous periods with sustained significant growth, and determine at what number of observations these events occur.
In this example the disease-specific threshold is determined based on consecutive significant observations from all available previous seasons. Significant observations are defined as those with a significant positive growth rate.
To capture short-term changes and fluctuations in the data, a rolling window of size $k = 5$ is used to create subsets of the data for model fitting,
and the quasipoisson
family is used to account for overdispersion.
The disease-specific threshold will be estimated with the four available previous seasons.
The seasonal_onset()
function can be used for this purpose, without providing the disease-specific threshold.
Then the consecutive_growth_warnings()
function can be used to create groups with subsequent significant observations.
The data can then be analysed, else you can use plot/autoplot to visualise the sequences of significant observations for each season.
The sum_of_cases
variable is divided by five to represent an average over a five-week window, defining the
disease-specific threshold for each time-step.
tsd_onset <- seasonal_onset( tsd = tsd_data, k = 5, family = "quasipoisson", na_fraction_allowed = 0.4, season_start = 21, # Season starts in week 21 season_end = 20, # Season ends in week 20 the following year only_current_season = FALSE ) consecutive_gr_warn <- consecutive_growth_warnings( onset_output = tsd_onset ) autoplot( consecutive_gr_warn, k = 5, skip_current_season = TRUE ) + ggplot2::geom_vline( ggplot2::aes(xintercept = 25, linetype = "Threshold"), color = "black", linewidth = 0.6 ) + ggplot2::scale_linetype_manual( name = "", values = c("Threshold" = "dashed") )
From the plot above, we observe the length of periods with subsequent significant growth rates (y-axis). The season with the longest consecutive period of growth is 2021/2022, lasting 14 weeks. However, since we are determining a threshold specifically for the 2024/2025 season, it's important to prioritize the most recent seasons. The 2023/2024 season shows two periods of significant growth, with the first being the longest and coinciding closely in timing with the consecutive growth period observed in 2022/2023. We select a disease-specific threshold of 25 to ensure early detection of the seasonal onset while minimizing false positives.
In other words, a season onset is declared when the average observation count over five weeks surpasses 25 and is accompanied by a significantly positive growth rate.
Inspect the exact conditions around each detected season start
consecutive_gr_warn |> dplyr::filter(!is.na(significant_counter)) |> dplyr::filter(season != max(consecutive_gr_warn$season)) |> dplyr::group_by(season) |> dplyr::filter(significant_counter == max(significant_counter)) |> dplyr::mutate(disease_threshold = sum_of_cases / 5, week = ISOweek::ISOweek(reference_time)) |> dplyr::select(season, week, disease_threshold)
By inspecting the output from the above code, the disease-specific threshold is established at 25
observations.
The primary function of the aedseo
package is the combined_seasonal_output()
which integrates the seasonal_onset()
and seasonal_burden_levels()
functions to deliver a comprehensive seasonal analysis.
Detailed information about each function and their respective arguments can be found in the vignette("seasonal_onset")
and vignette("burden_levels")
.
seasonal_output <- combined_seasonal_output( tsd = tsd_data, disease_threshold = 25, method = "intensity_levels", family = "quasipoisson" )
The default function estimates onset and burden levels for the current season. If it is desired to see calculations for all previous seasons, the only_current_season
argument should be set to FALSE
.
Note: Burden levels can not be estimated for the first season and needs at least two seasons of data as the estimations are based on data from previous seasons.\
The aedseo
package implements S3 methods including the plot()
, predict()
and summary()
functions specifically designed for objects of the aedseo
package.
predict()
is only relevant for tsd_onset
objects.
An example of using the summary()
S3 method with tsd_onset
and tsd_burden_level
objects is shown here.
Seasonal onset output can be extracted by:
summary(seasonal_output$onset_output)
Seasonal burden output can be extracted by:
summary(seasonal_output$burden_output)
The plot()
S3 method for tsd_combined_seasonal_output
objects allows you to get a complete visualisation of the combined_seasonal_output()
analysis of the current season.
# Adjust y_lower_bound dynamically to remove noisy small values disease_threshold <- 25 y_lower_bound <- ifelse(disease_threshold < 10, 1, 5) plot( x = seasonal_output, y_lower_bound = y_lower_bound, time_interval = "3 weeks" )
Using the intensity_levels
method to define burden levels, the seasonal onset is likely to fall within the low
or medium
category. This is because the very low
breakpoint is the disease-specific threshold, and season onset is only identified if
the five-week average of the observations exceed this threshold along with a significant positive growth rate.
The historical_summary()
function for tsd_onset
objects provides historical estimates from all previous seasons.
By utilising this function, it is easy to assess whether current estimates align with previously observed patterns for
a specific pathogen, or if significant changes have occurred. Such changes might result from altered testing practices,
pathogen mutations, or other factors.
If the analysis indicates notable deviations from past patterns, it is advisable to revisit the method used to define the disease-specific threshold, as it might need some adjustment.
# Get `tsd_onset` object tsd_onset <- seasonal_onset( tsd = tsd_data, disease_threshold = 25, family = "quasipoisson", season_start = 21, season_end = 20, only_current_season = FALSE ) historical_summary(tsd_onset)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.