knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(aedseo)
To provide a concise overview of how the seasonal_burden_levels()
algorithm operates, we utilize the same example data presented
in the vignette("aedseo")
. The plot below illustrates the two methods available for estimating burden levels in the
combined_seasonal_output()
function:
intensity_levels
: Intended for within-season classification of observations.peak_levels
: Intended for comparing the height of peaks between seasons.The disease-specific threshold is the very low
breakpoint for both methods. Breakpoints are named as the upper bounds for the burden levels,
and are visualised on following plot.
withr::local_seed(222) # Construct an 'tsd' object with time series data tsd_data_clean <- generate_seasonal_data( years = 3, start_date = as.Date("2021-10-18"), noise_overdispersion = 1, relative_epidemic_concentration = 3, time_interval = "week" ) # Run models intensity_levels <- seasonal_burden_levels( tsd = tsd_data_clean, disease_threshold = 10, method = "intensity_levels", conf_levels = 0.95 ) peak_levels <- seasonal_burden_levels( tsd = tsd_data_clean, disease_threshold = 10, method = "peak_levels", conf_levels = c(0.4, 0.9, 0.975), n_peak = 8 ) # Create data frame burden_levels_df <- data.frame( Level = names( c(intensity_levels$values, peak_levels$values ) ), Threshold = c( intensity_levels$values, peak_levels$values ), Method = c( rep("Intensity Levels", 4), rep("Peak Levels", 4) ) ) burden_levels_df$Level <- factor( burden_levels_df$Level, levels = c("high", "medium", "low", "very low") ) # Calculate y_tics y_tics_log10 <- pretty(c(log10(burden_levels_df$Threshold))) y_tics_levels <- 10^(y_tics_log10) # For each tic, find the closest magnitude to round correctly round_to_nearest <- function(x) { magnitude <- 10^floor(log10(x)) plyr::round_any(x, accuracy = magnitude) } y_tics <- sapply(y_tics_levels, round_to_nearest) # Round max value rounded_max_threshold <- (round(max(burden_levels_df$Threshold) / 100) * 100) + 100 y_tics <- c(y_tics[-5], rounded_max_threshold) # Create the plot burden_levels_df |> ggplot2::ggplot(ggplot2::aes(x = 0, y = Threshold, linetype = Level, color = Level)) + ggplot2::geom_hline( ggplot2::aes(yintercept = Threshold, linetype = Level, color = Level), linewidth = 1 ) + ggplot2::labs( x = NULL, y = "Observations", linetype = "Aedseo levels", color = "Aedseo levels" ) + ggplot2::scale_linetype_manual( values = c( "very low" = "dotted", "low" = "dashed", "medium" = "longdash", "high" = "solid" ) ) + ggplot2::scale_color_manual( values = c( "very low" = "#44ce1b", "low" = "#bbdb44", "medium" = "#f2a134", "high" = "#e51f1f" ) ) + ggplot2::facet_wrap(~ Method, ncol = 2) + ggplot2::theme_minimal() + ggplot2::theme( legend.position = "right", legend.key.width = grid::unit(2, "cm"), legend.text = ggplot2::element_text(size = 11, color = "black"), strip.text = ggplot2::element_text(size = 11, color = "black"), axis.text = ggplot2::element_text(size = 9, color = "black"), axis.title.y = ggplot2::element_text(size = 11, color = "black") ) + ggplot2::scale_y_log10( breaks = y_tics, limits = range(y_tics), labels = scales::label_comma(big.mark = ".", decimal.mark = ",") ) + ggplot2::guides(linetype = "none")
The methodology used to define the burden levels of seasonal epidemics is based on observations from previous seasons. Historical data from all available seasons is used to establish the levels for the current season. This is done by:
n
highest (peak) observations from each season.n
peak observations. The selected distribution with the fitted parameters is used to calculate percentiles to be used as breakpoints.peak_levels
which models the height of the seasonal peaks. Using the log-normal distribution without weights is similar
to the default in mem.intensity_levels
which models the within season levels. The highest breakpoint is identical with the peak_levels
method.
Intermediate breakpoints are evenly distributed on a logaritmic scale, between the very low
and high
breakpoints,
to give the same relative difference between the breakpoints.The model is implemented in the seasonal_burden_levels()
function of the aedseo
package.
In the following sections we will describe the arguments for the function and how the model is build.
The n_peak
argument defines the number of highest observations to be included from each season.
The default of n_peak
is 6
- corresponding with the mem
defaults of using 30 observations across the latest five seasons.
The decay_factor
argument is implemented to give more weight to recent seasons, as they are often more indicative of current and future trends.
As time progresses, the relevance of older seasons may decrease due to changes in factors like testing recommendations, population immunity,
virus mutations, or intervention strategies. Weighting older seasons less reflects this reduced relevance.
From time-series analysis, $\frac{1}{1-\text{decay_factor}}$ is often used as an approximate “effective memory”.
Hence, with the default decay_factor
= 0.8 the effective memory is five seasons.
(See mentioned by Hyndman & Athanasopoulos for an introduction to simple exponential smoothing)
The default decay_factor
allows the model to be responsive to recent changes without being overly sensitive to short-term fluctuations.
The optimal decay_factor
can vary depending on the variability and trends within the data.
For datasets where seasonal patterns are highly stable, a higher decay_factor
(i.e. longer memory) may be appropriate.
Conversely, if the data exhibit dramatic shifts from one season to the next, a lower decay_factor
may improve predictions.
The family
argument is used to select which distribution the n_peak
observations should be fitted to, users can
choose between lnorm
, weibull
and exp
distributions. The log-normal distribution theoretically
aligns well with the nature of epidemic data, which often exhibits multiplicative growth patterns.
In our optimisation process, we evaluated the distributions to determine their performance in fitting Danish non-sentinel
cases and hospitalisation data for RSV, SARS-CoV-2 and Influenza (A and B). All three distributions had comparable
weighted likelihood values during optimisation, hence we did not see any statistical significant difference in their performance.
The model uses the fit_percentiles()
function which employs the stats::optim
for estimating the parameters that maximizes the weighted likelihood.
The optim_method
argument can be passed to seasonal_burden_levels()
, default is Nelder-Mead
but other methods can be selected,
see ?fit_percentiles
.
The method
argument is used to select one of the two methods intensity_levels
(default) and peak_levels
.
Both methods return percentile(s) from the fitted distribution which are used to define the breakpoins for the burden levels.
Breakpoints are named very low
, low
, medium
and high
and define the upper bound of the corresponding burden level.
intensity_levels
takes one percentile as argument, representing the highest breakpoint.
The default is set at a 95% percentile. The disease-specific threshold
determines the very low
breakpoint. The low
and medium
breakpoints are calculated to give identical relative
increases between the very low
and high
breakpoints.
peak_levels
takes three percentiles as argument, representing the low
, medium
and high
breakpoints.
The default percentiles are 40%, 90%, and 97.5% to align with the parameters used in mem
.
The disease-specific threshold defines the very low
breakpoint.
seasonal_burden_levels()
algorithmThe same data is created with following combinations:
These combinations are selected as it is realistic for real world data to have noise, and differentiation between trend can occur declining or inclining between seasons. Breakpoints for season 2024/2025 are calculated based on the three previous seasons.
To apply the seasonal_burden_levels()
algorithm, data needs to be transformed into a tsd
object.
The disease-specific threshold is determined for all data combinations with use of the method described in vignette("aedseo")
. The disease-specific threshold should be revised before a new season starts, especially
if the data has a trend.
withr::local_seed(222) # Construct 'tsd' objects with time series data tsd_data_noise <- generate_seasonal_data( years = 5, start_date = as.Date("2020-10-18"), amplitude = 600, mean = 600, phase = 0, noise_overdispersion = 10, relative_epidemic_concentration = 3, time_interval = "week" ) tsd_data_noise_and_pos_trend <- generate_seasonal_data( years = 5, start_date = as.Date("2020-10-18"), amplitude = 600, mean = 600, phase = 0, noise_overdispersion = 10, trend_rate = 1.002, relative_epidemic_concentration = 3, time_interval = "week" ) tsd_data_noise_and_neg_trend <- generate_seasonal_data( years = 5, start_date = as.Date("2020-10-18"), amplitude = 600, mean = 600, phase = 0, noise_overdispersion = 10, trend_rate = 0.99, relative_epidemic_concentration = 3, time_interval = "week" ) # Remove days after week 20 in last season to get 5 seasons data tsd_data_all <- rbind( tsd_data_noise |> dplyr::mutate(Data = "Noise"), tsd_data_noise_and_pos_trend |> dplyr::mutate(Data = "Noise and positive trend"), tsd_data_noise_and_neg_trend |> dplyr::mutate(Data = "Noise and negative trend") ) |> dplyr::filter(time <= "2025-05-12") |> dplyr::mutate( Data = factor(Data, levels = c("Noise", "Noise and positive trend", "Noise and negative trend")) ) start_date <- min(tsd_data_all$time) end_date <- max(tsd_data_all$time) tsd_data_all |> ggplot2::ggplot(ggplot2::aes( x = time, y = observation, color = Data, group = Data )) + ggplot2::geom_line(linewidth = 0.7) + ggplot2::geom_point() + ggplot2::scale_color_manual( name = "Seasonal data", values = c( "Noise" = "blue", "Noise and positive trend" = "green", "Noise and negative trend" = "red" ) ) + aedseo:::time_interval_x_axis( start_date = start_date, end_date = end_date, time_interval_step = "6 weeks" ) + ggplot2::labs(y = "Weekly observations") + ggplot2::theme_bw() + ggplot2::theme( axis.text = ggplot2::element_text(size = 9, color = "black", family = "sans"), axis.title.x = ggplot2::element_text(size = 11, color = "black", family = "sans"), axis.title.y = ggplot2::element_text(size = 11, color = "black", family = "sans") )
The seasonal_burden_levels()
function provides a tsd_burden_level
object, which can be used in the summary()
S3 method.
This provides a concise summary of your comprehensive seasonal burden level analysis, including breakpoints for the current season.
intensity_levels_n <- seasonal_burden_levels( tsd = tsd_data_noise, disease_threshold = 10, method = "intensity_levels", conf_levels = 0.975 ) summary(intensity_levels_n)
intensity_levels_n_pos_t <- seasonal_burden_levels( tsd = tsd_data_noise_and_pos_trend, disease_threshold = 20, method = "intensity_levels", conf_levels = 0.975 ) intensity_levels_n_neg_t <- seasonal_burden_levels( tsd = tsd_data_noise_and_neg_trend, disease_threshold = 5, method = "intensity_levels", conf_levels = 0.975 )
mem
uses the n
highest observations from each previous epidemic period to fit the parameters of the distribution,
where n = 30/seasons
. The data has four previous seasons, to align with mem, we use n_peak = 8
peak_levels_n <- seasonal_burden_levels( tsd = tsd_data_noise, disease_threshold = 10, method = "peak_levels", conf_levels = c(0.4, 0.9, 0.975), n_peak = 8 ) summary(peak_levels_n)
peak_levels_n_pos_t <- seasonal_burden_levels( tsd = tsd_data_noise_and_pos_trend, disease_threshold = 20, method = "peak_levels", conf_levels = c(0.4, 0.9, 0.975), n_peak = 8 ) peak_levels_n_neg_t <- seasonal_burden_levels( tsd = tsd_data_noise_and_neg_trend, disease_threshold = 5, method = "peak_levels", conf_levels = c(0.4, 0.9, 0.975), n_peak = 8 )
mem is run with default arguments.
# Remove current season such as previous seasons predict for newest season previous_seasons <- tsd_data_all |> dplyr::mutate(season = epi_calendar(time)) |> dplyr::filter(season != "2024/2025") |> dplyr::select(-season) # Run mem algorithm mem_thresholds <- previous_seasons |> dplyr::group_by(Data) |> dplyr::group_modify(~ { mem_data <- .x |> dplyr::mutate(season = aedseo::epi_calendar(time), week = lubridate::isoweek(time)) |> dplyr::select(-time) |> tidyr::pivot_wider(names_from = season, values_from = observation) |> dplyr::select(-week) # Run mem mem_result <- mem::memmodel(mem_data) # Extract thresholds mem_thresholds <- tibble::tibble( `epidemic threshold \n (mem)` = mem_result$epidemic.thresholds[1], `medium` = mem_result$intensity.thresholds[1], `high` = mem_result$intensity.thresholds[2], `very high` = mem_result$intensity.thresholds[3] ) })
#### Create data frame burden_levels_df <- tibble::tibble( Level = names( c(intensity_levels_n$values, intensity_levels_n_pos_t$values, intensity_levels_n_neg_t$values, peak_levels_n$values, peak_levels_n_pos_t$values, peak_levels_n_neg_t$values ) ), Threshold = c( intensity_levels_n$values, intensity_levels_n_pos_t$values, intensity_levels_n_neg_t$values, peak_levels_n$values, peak_levels_n_pos_t$values, peak_levels_n_neg_t$values ), Method = c( rep("Intensity levels", 4), rep("Intensity levels", 4), rep("Intensity levels", 4), rep("Peak levels", 4), rep("Peak levels", 4), rep("Peak levels", 4) ), Data = c( rep("Noise", 4), rep("Noise and positive trend", 4), rep("Noise and negative trend", 4), rep("Noise", 4), rep("Noise and positive trend", 4), rep("Noise and negative trend", 4) ) ) mem_levels_df <- mem_thresholds |> tidyr::pivot_longer(cols = `epidemic threshold \n (mem)`:`very high`, names_to = "Level", values_to = "Threshold") |> dplyr::mutate(Method = "mem levels") # Combine the threshold data frames from the two methods levels_all <- dplyr::bind_rows(burden_levels_df, mem_levels_df) |> dplyr::mutate( Level = factor(Level, levels = c("very high", "high", "medium", "low", "very low", "epidemic threshold \n (mem)")), Method = factor(Method, levels = c("Intensity levels", "Peak levels", "mem levels")), Data = factor(Data, levels = c("Noise", "Noise and positive trend", "Noise and negative trend")) ) # Merge observations all_observations <- tsd_data_all |> dplyr::filter(time <= "2025-04-01") |> dplyr::filter(time >= "2024-05-20") |> dplyr::filter(!is.na(observation), observation > 3) # Calculate y_tics y_tics_log10 <- pretty(c(log10(levels_all$Threshold))) y_tics_levels <- 10^(y_tics_log10) # For each tic, find the closest magnitude to round correctly round_to_nearest <- function(x) { magnitude <- 10^floor(log10(x)) plyr::round_any(x, accuracy = magnitude) } y_tics <- sapply(y_tics_levels, round_to_nearest) # Create a combined plot all_observations |> ggplot2::ggplot(ggplot2::aes(x = time, y = observation)) + ggplot2::geom_point(ggplot2::aes(color = "observations")) + ggplot2::geom_hline( data = levels_all, ggplot2::aes(yintercept = Threshold, linetype = Level, color = Level), linewidth = 1 ) + ggplot2::scale_linetype_manual( values = c( "very low" = "dotted", "low" = "dashed", "medium" = "longdash", "high" = "solid", "very high" = "dotdash", "epidemic threshold \n (mem)" = "dotted" ) ) + ggplot2::scale_color_manual( name = "", values = c( "observations" = "black", "very low" = "#44ce1b", "low" = "#bbdb44", "medium" = "#f2a134", "high" = "#e51f1f", "very high" = "#891212", "epidemic threshold \n (mem)" = "#44ce1b" ) ) + ggplot2::facet_grid(Data ~ Method) + ggplot2::theme_minimal() + ggplot2::theme( legend.position = "right", legend.key.width = grid::unit(2, "cm"), legend.text = ggplot2::element_text(size = 11, color = "black"), strip.text = ggplot2::element_text(size = 11, color = "black"), axis.text = ggplot2::element_text(size = 9, color = "black"), axis.title.x = ggplot2::element_text(size = 11, color = "black"), axis.title.y = ggplot2::element_text(size = 11, color = "black") ) + ggplot2::scale_y_log10( breaks = y_tics, limits = range(y_tics), labels = scales::label_comma(big.mark = ".", decimal.mark = ",") ) + aedseo:::time_interval_x_axis( start_date = min(all_observations$time), end_date = as.Date("2025-05-12"), time_interval_step = "5 weeks" ) + ggplot2::guides(linetype = "none")
In the plots:
Upon examining all methods and data combinations, it becomes clear that the intensity_levels
approach establishes
levels covering the entire set of observations from previous seasons. In contrast, the peak_levels
and mem
methods
define levels solely based on the highest observations within each season, and are thus only relevant for comparing
the height of peaks between seasons.
The highest observations for the 2024/2025 season for each data set are:
| Data | Observation | |---------------------------|-------------| | Noise | 1466 | | Noise and positive trend | 1967 | | Noise and negative trend | 171 |
In relation to these highest observations and upon further examination, we observe the following:
Plots with Noise and Noise with Positive Trend:
peak_levels
and mem
estimate very high breakpoints. This occurs because observations
remain consistently elevated across all three seasons, causing these methods to overlook the remaining
observations.Data with Noise and Positive Trend:
Data with Noise and Negative Trend:
As observations exponentially decrease between seasons (with the highest observation this season being 171),
we expect the breakpoints to be the lowest of these examples. This expectation is met across all three methods.
However, the weighting of seasons in intensity_levels
and peak_levels
leads to older seasons having less
impact on the breakpoints, as we progress forward in time. On the other hand, mem
includes all high observations
from the previous 10 seasons without diminishing the importance of older seasons, which results in sustained
very high breakpoints.
Notably, in the mem
method, the epidemic threshold is positioned slightly above the medium
burden level.
This means that the epidemic period begins only when observations reach the height of the seasonal peaks
observed in previous seasons.
In conclusion, the peak_levels
and mem
methods allows us to compare the height of peaks between seasons,
whereas the intensity_levels
method supports continuous monitoring of observations in the current season.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.