This vignette demonstrates the functionality provided by the sarimaTD
package using influenza data available from the cdcfluview package. The goal of
this package is to provide a set of functions for estimating and generating
predictions from a SARIMA model for approximately seasonal infectious disease
time series data that comes as close as possible to working "out of the box"
with minimal adjustments required.
We will work with the san_juan_dengue
data provided in this package. Note
that these data are the "Training Data" available for San Juan, Puerto Rico from
NOAA here: http://dengueforecasting.noaa.gov/
library(sarimaTD) library(dplyr) library(ggplot2) ggplot(data = san_juan_dengue, mapping = aes(x = week_start_date, y = total_cases)) + geom_line()
Let's fit models using the data up through the 2007/2008 season and make predictions for the 2008/2009 season.
Let's fit a few variations on models using different transformations, with and without an initial seasonal differencing. Cite Hyndman, who recommends a preliminary seasonal differencing somewhere...
Note that stationary = TRUE for auto.arima forces all differencing = 0 because resulting model is non-stationary if differencing is used.
train_indices <- san_juan_dengue$season %in% paste0(1990:2004, "/", 1991:2005) sarima_fit_bc_transform <- fit_sarima( y = san_juan_dengue$total_cases[train_indices], ts_frequency = 52, transformation = "box-cox", sarimaTD_d = 0, sarimaTD_D = 1)
eval_indices <- san_juan_dengue$season %in% paste0(1990:2005, "/", 1991:2006) sampled_trajectories_bc_transform <- simulate( object = sarima_fit_bc_transform, nsim = 1000, seed = 1, newdata = san_juan_dengue$total_cases[eval_indices], h = 52 ) plot_indices <- san_juan_dengue$season %in% paste0(2003:2006, "/", 2004:2007) preds_df <- san_juan_dengue %>% filter(season == "2006/2007") %>% mutate( pred_total_cases = apply(sampled_trajectories_bc_transform, 2, mean), pred_95_lb = apply(sampled_trajectories_bc_transform, 2, quantile, probs = 0.025), pred_95_ub = apply(sampled_trajectories_bc_transform, 2, quantile, probs = 0.975), pred_80_lb = apply(sampled_trajectories_bc_transform, 2, quantile, probs = 0.05), pred_80_ub = apply(sampled_trajectories_bc_transform, 2, quantile, probs = 0.95), pred_50_lb = apply(sampled_trajectories_bc_transform, 2, quantile, probs = 0.25), pred_50_ub = apply(sampled_trajectories_bc_transform, 2, quantile, probs = 0.75) ) ggplot() + geom_ribbon( mapping = aes(x = week_start_date, ymin = pred_95_lb, ymax = pred_95_ub), fill = "cornflowerblue", alpha = 0.2, data = preds_df) + geom_ribbon( mapping = aes(x = week_start_date, ymin = pred_80_lb, ymax = pred_80_ub), fill = "cornflowerblue", alpha = 0.2, data = preds_df) + geom_ribbon( mapping = aes(x = week_start_date, ymin = pred_50_lb, ymax = pred_50_ub), fill = "cornflowerblue", alpha = 0.2, data = preds_df) + geom_line( mapping = aes(x = week_start_date, y = pred_total_cases), color = "cornflowerblue", data = preds_df) + geom_line(mapping = aes(x = week_start_date, y = total_cases), data = san_juan_dengue[plot_indices, , drop = FALSE])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.