knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(aedseo)
The methodology used to detect the seasonal onset of epidemics, can be divided into two essential criteria:
Here, $k$ denotes the window size employed to obtain the local estimate of the exponential growth rate and SoC. When both of these criteria are met, an alarm is triggered and the onset of the seasonal epidemic is detected.
The model is implemented in the seasonal_onset()
function of the aedseo
package.
Criteria one is fulfilled if the growth_warning
in the output is TRUE
.
Criteria two is fulfilled if the sum_of_cases_warning
in the output is TRUE
.
The exponential growth rate, denoted as $r$, represents the per capita change in the number of new cases per unit of time. Given that incidence data are integer-valued, the proposed method relies on generalized linear models (GLM). For count data, the Poisson distribution is a suitable choice as a model. Hence, the count observations denoted as $Y$ are assumed to follow a Poisson distribution
\begin{equation} Y \sim \mathrm{Pois}(\lambda) \end{equation}
Here, the link function, $\log()$, connects the linear predictor to the expected value of the data point, expressed as $\log(\lambda)=\mu$. Given a single continuous covariate $t$, the mean $\mu$ can be expressed as
\begin{equation} \mu = \alpha + r t \end{equation}
This is equivalent to a multiplicative model for $\lambda$, i.e.
\begin{equation} \lambda = \exp(\alpha + r t) = \exp(\alpha) \exp(r t) \end{equation}
Intuitively, negative values of $r$ result in a decline in the number of observed cases, while $r=0$ represents stability, and positive values of $r$ indicate an increase.
It is important to note that the Poisson distribution assumes that the mean and variance are equal. In reality, real data often deviate from this assumption, with the variance ($v$) being significantly larger than the mean. This biological phenomenon, known as overdispersion, can be addressed within a model in various ways. One approach is to employ quasi-Poisson regression, which assumes $v=\sigma\lambda$, or to use negative binomial regression (not implemented yet), which assumes $v=\lambda+\lambda^2/\theta$, where both $\sigma$ and $\theta$ are overdispersion parameters.
First we generate some data as an tsd
object, with the generate_seasonal_data()
function.
# Construct an 'tsd' object with time series data set.seed(222) tsd_data <- generate_seasonal_data( years = 3, start_date = as.Date("2020-10-18"), trend_rate = 1.002, noise_overdispersion = 3, relative_epidemic_concentration = 2, time_interval = "week" )
Next, the tsd
object is passed to the seasonal_onset()
function. Here, a window size of k=5
is specified,
meaning that a total of 5 weeks is used in the local estimate of the exponential growth rate.
na_fraction_allowed = 0.4
defines how large a fraction of observations in the k
window that are allowed to be NA
, here 0.4*5 = 2
observations.
Additionally, a 95\% confidence interval is specified. Finally, the exponential growth rate is estimated using quasi-Poisson regression to account for overdispersion in the data.
A disease-specific threshold can additionally be passed to the function, but is not necessary if only the growth rate estimations are wanted.
season_start
and season_end
can be used to specify the season to stratify the observations by.
This algorithm runs across seasons, such that the first observation in a new season will use the last k-1
observations from the previous season.
The seasonal_onset()
function provides a tsd_onset
object with a comprehensive seasonal onset analysis.
seasonal_onset_results <- seasonal_onset( tsd = tsd_data, k = 5, level = 0.95, disease_threshold = 20, family = "quasipoisson", season_start = 21, season_end = 20, only_current_season = FALSE )
In the first figure, observations over time are shown with a legend for the seasonal onset alarm.
In the second figure, the local estimates of the growth rates are presented along with their corresponding 95% confidence interval
with a legend for the growth warning.
This visualisation can be generated by utilizing the plot()
S3 method with objects of the tsd_onset
class.
plot(seasonal_onset_results)
The predict()
S3 method for tsd_onset
objects allows you to make predictions for future time steps based on the estimated growth rates.
Following is an example of predict observations for the next 5 weekly time steps.
prediction <- predict(seasonal_onset_results, n_step = 5)
prediction <- prediction |> dplyr::select(-t) |> dplyr::rename(observation = estimate) # Extract two seasons for better visualisation: seasonal_onset_short <- seasonal_onset_results |> dplyr::filter(season %in% c("2023/2024", "2024/2025")) # Plot observations and predictions autoplot(seasonal_onset_short)$observed + # Add the prediction ribbon ggplot2::geom_ribbon( data = prediction, ggplot2::aes( x = reference_time, ymin = lower, ymax = upper, fill = "Observations with \n Confidence intervals" ), alpha = 0.2, inherit.aes = FALSE ) + ggplot2::geom_line( data = prediction, ggplot2::aes( x = reference_time, y = observation, color = "Observations with \n Confidence intervals" ), linewidth = 1, inherit.aes = FALSE ) + ggplot2::scale_color_manual( name = "Predictions", values = c("Observations with \n Confidence intervals" = "red") ) + ggplot2::scale_fill_manual( name = "Predictions", values = c("Observations with \n Confidence intervals" = "red") ) + ggplot2::coord_cartesian( xlim = c( min(seasonal_onset_short$reference_time), max(prediction$reference_time) ) )
In the example above, we use the predict method to predict growth rates for the next 5 time steps, according to the
time_interval = "week"
in the tsd_onset
object.
The n_step
argument specifies the number of steps into the future you want to forecast.
The resulting tsd_predict
object contains observation estimates, lower bounds, and upper bounds for each time step.
The summary()
S3 method for tsd_onset
objects provides a concise summary of your automated early detection of seasonal_onset
analysis.
You can use it to retrieve important information about your analysis, including the first seasonal onset alarm (reference time point),
SoC at reference time point (here over a 5 week window), growth rate estimates at reference time point, total number of growth warnings
in the series and latest warnings (growth and SoC). It helps you quickly assess the key findings of your analysis.
summary(seasonal_onset_results)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.