knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(tibble.print_min = 4L, tibble.print_max = 4L) library(flumodelr) library(lubridate) library(scales) library(prophet)
Basic workflow for creating a 'simulation' influenza season.
1) Use CDC reported data to inform general shape / curve.
1a) Integrate the CDC P&I deaths into a density plot.
1b) Fit a two-mixture beta regression to determine initial values.
2) Use beta regression parameters as a guide for simulation
2a) Simulate season density using two-mixture beta distribution.
2b) Convert density back to raw counts using arbitrary value.
3) Refine as a function program
EpiWeekToDates <- function(year, weeknum) { #Compute 4th day in January jan4 <- ymd(paste(year, 1, 4, sep="-")) #Compute day of week DofW <- wday(jan4, week_start = 7)-1 #Sunday start #The -1 is correction for how wday() counts days #If Jan 4th was Sun, starting week date is 01/04/xxxx startweek <- if_else(DofW == 7, jan4, (jan4 - (DofW))) #First Date of Week d0 = startweek + (weeknum - 1) * 7 #Last Date of Week d1 = startweek + (weeknum - 1) * 7 + 6 return(list(d0 = ymd(d0), d1 = ymd(d1))) }
cdc_122 <- flumodelr::cdc122city #some formatting cdc_122 <- cdc_122 %>% group_by(year, week) %>% summarize(fludeaths = sum(deaths_pnaflu, na.rm=T)) %>% mutate(week_dt = EpiWeekToDates(year, week)[[1]])
g1 <- ggplot(cdc_122, aes(x=week_dt, y=fludeaths)) + geom_line(size=0.8, colour="#CC0000") + scale_x_date(labels = date_format("%Y"), date_breaks="1 year", expand=c(0, .9)) + xlab("Year") + ylab("No. of Deaths from P&I") + theme_light(base_size=16) + theme(plot.title = element_text(size=14)) + labs(title="Figure 1. Pneumonia and Influenza Mortality") + theme(axis.text.x=element_text(angle=90, hjust=1)) g1
Because there is a systematic shift in the 1990s, will start modeling after that point.
cdc_122_1990 <- dplyr::filter(cdc_122, year>=1990)
g1 <- ggplot(cdc_122_1990, aes(x=week_dt, y=fludeaths)) + geom_line(size=0.8, colour="#CC0000") + scale_x_date(labels = date_format("%Y"), date_breaks="1 year", expand=c(0, .9)) + xlab("Year") + ylab("No. of Deaths from P&I") + theme_light(base_size=16) + theme(plot.title = element_text(size=14)) + labs(title="Figure 1. Pneumonia and Influenza Mortality") + theme(axis.text.x=element_text(angle=90, hjust=1)) g1
Much more stable.
fludta_phet <- cdc_122_1990 %>% dplyr::select(week_dt, fludeaths) %>% dplyr::rename(ds = week_dt, y=fludeaths) p_fit <- prophet(fludta_phet, seasonality.mode = 'multiplicative', mcmc.samples = 2000, interval.width = 0.95)
future <- make_future_dataframe(p_fit, periods = 53, freq='week') tail(future)
forecast <- predict(p_fit, future) tail(forecast[c('ds', 'yhat', 'yhat_lower', 'yhat_upper')])
plot(p_fit, forecast)
Prophet does not fit the outlier seasons well (as expected they were unusually severe season).
Total deaths obtained from actual CDC reported data for 2010-2015. It is assumed this value is accurate, but what is unknown are the proportion of deaths due to influenza.
flu_real <- fludta %>% dplyr::filter(week<=52)
ggplot(flu_real, aes(x=week_in_order, y=alldeaths)) + geom_line()
We add the real all-cause deaths to the dataset.
flu_sim$deaths_all <- flu_real$alldeaths
The objective is to simulate deaths attributable to influenza under different assumptions.
A seasonal influenza data generating process.
Features of influenza deaths:
1) Mild or severe season, qualitatively increases number of deaths.
2) Peak month, which month has peak
3) Peakedness, whether a sharp rise/fall, or gentle slope.
$$Eq \ 1. \ y = a(x-h)^2 + k$$
Where k
is the highest observed influenza death, h
is the peak week, and a
determines peakedness.
#y = a(x-h)^2 + k
#peak of february, week 4 flu_sim_1 <- flu_sim %>% mutate(peak_week = row_number() - 4, peak = dgamma(week, shape = 4.2, scale=1.2), deaths_flu_t = round(deaths_all*peak * season))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.