knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(tibble.print_min = 4L, tibble.print_max = 4L) library(flumodelr) library(lubridate) library(scales)
The cyclical regression model published by Serfling [@Serlfing1963] is not commonly implemented in its original form.
In current methods, most researchers incorporate data on the percent of isolates positive with influenza viral types. This is publically available in many cases, see ?nrevss
. The data describes the proportion of isolates tested, which may be obtained from participating hospital and outpatient settings. Using this surrogate time-periods of low flu activity can be used to construct a counterfactual for what the rate of the reported influenza outcome would be in the absence of high influenza activity. Whereas the original model relied on "off-season" data and the Fourier term, virology data gives you a more direct measure. This ultimately provides increased confidence that deaths are influenza related and estimated influenza morbidity and mortality.
The example dataset for this package, flumodelr::fludta
is a combination of the CDC-122 cities dataset, ?cdc122city
and the public virology data ?nrevss
. The virology is matched by week, with a lag of 2 weeks to allow for reporting of deaths (i.e. Virology(week_t-2)=Mortality(week_t)).
df_cdc <- flumodelr::fludta %>% dplyr::filter(year>=2010 & year<2016)
ggplot(df_cdc, aes(x=yrweek_dt)) + geom_line(aes(y=perc_fludeaths, colour="% Deaths from P&I", linetype="% Deaths from P&I"), size=0.8) + geom_line(aes(y=prop_flupos*10, colour="No. Positive per 10 isolates", linetype="No. Positive per 10 isolates"), size=0.8) + scale_colour_manual("Line", breaks=c("% Deaths from P&I", "No. Positive per 10 isolates"), values = c("% Deaths from P&I"="#CC0000", "No. Positive per 10 isolates"="black")) + scale_linetype_manual("Line", breaks=c("% Deaths from P&I", "No. Positive per 10 isolates"), values = c("% Deaths from P&I"=1, "No. Positive per 10 isolates"=2)) + scale_x_date(labels = date_format("%Y"), date_breaks="1 year", expand=c(0, .9)) + xlab("Year") + ylab("%") + theme_light(base_size=16) + theme(plot.title = element_text(size=14)) + labs(title="Figure 1. Influenza (+) isolates over time")
The original model can be modified like so:
$$Eq \ 1. \ y = \alpha_0 + \beta_1t + \beta_2Flu_t + sin(\frac{2 \pi t}{period}) + cos(\frac{2 \pi t}{period}) + u$$ Where Flu = no. of isolates positive for influenza / total isolates tested in a given timepoint t. period is a fixed parameter equal to the cycle of the time unit (e.g. 52 weeks in a year).
Many published examples of this exist [@Wang2012, @Matias2016]. Some authors include data on the % RSV (+), as well as breakdown influenza by subtype and even include data on weather, humidity etc. Including individual terms for each viral subtype may have advantages as certain types of influenza are associated with more severe outcomes such as hospitalization and death.
A general wrapper, flum()
, for modeling time-series data is written for this purpose. This will compute a traditional serfling model or other generalized linear models, a smoothed model (adds polynomial terms), a model which allows for virology data, and other options explored below.
head(df_cdc)
flum()
)df_cdc_2 <- df_cdc %>% mutate(t = row_number(), #set origin to october theta = 2*t / 52, sin_2 = sinpi(theta), cos_2 = cospi(theta))
See Serfling model background for discussion on Fourier terms and cyclical regression.
df_cdc_2 <- df_cdc_2 %>% mutate(week_2 = t^2, week_3 = t^3, week_4 = t^4, week_5 = t^5)
base_fit <- df_cdc_2 %>% lm(perc_fludeaths ~ t + week_2 + week_3 + week_4 + week_5 + prop_flupos + sin_2 + cos_2, data=., na.action = na.exclude) summary(base_fit)
Here we predict the outcome for each observation, given the fitted model base_fit
above. We tell R to compute 95% prediction intervals. This is conventional for most modern approaches, see [insert cites]. The original Serfling paper estimated a threshold 1.64 standard deviations above the trend line.
base_pred <- df_cdc_2 %>% mutate(prop_flupos = 0) %>% #Note setting to zero predict(base_fit, newdata=., se.fit=TRUE, interval="prediction", level=0.90)
Critically, before predicting observations, we set the measure of our influenza activity, prop_flupos
to zero. This is to estimate the influenza activity at each timepoint, assuming that the proportion of influenza isolates was equal to zero.
fludta_fitted <- df_cdc_2 %>% add_column(., y0=base_pred$fit[,1], y0_ul=base_pred$fit[,3]) head(fludta_fitted)
#Set up graph labels, line specs line_names <- c("% Deaths From P&I", "Expected %", "Epidemic Threshold") line_cols <- c("#CC0000", "black", "black") line_types <- c(1, 1, 2) names(line_cols) <- line_names names(line_types) <- line_names ggplot(fludta_fitted, aes(x=yrweek_dt)) + geom_line(aes(y=perc_fludeaths, colour=line_names[[1]], linetype=line_names[[1]]), size=0.8) + geom_line(aes(y=y0, colour=line_names[[2]], linetype=line_names[[2]]), size=0.8) + geom_line(aes(y=y0_ul, colour=line_names[[3]], linetype=line_names[[3]]), size=0.8) + scale_colour_manual("Line", breaks=line_names, values = line_cols) + scale_linetype_manual("Line", breaks=line_names, values = line_types) + scale_x_date(labels = date_format("%Y"), date_breaks="1 year", expand=c(0, .9)) + xlab("Year") + ylab("% of Deaths from P&I") + theme_light(base_size=14) + theme(legend.text=element_text(size=10), plot.title = element_text(size=14)) + labs(title="Figure 2. Pneumonia and Influenza Mortality", caption="Modified Serfling Model") + guides(colour = guide_legend("Line"), linetype = guide_legend("Line"))
% of Deaths from P&I - No. of influenza / pneumonia deaths per 100,000 people.
"Expected % - A fit of the cyclical regression model, (one Fourier term).
Epidemic Threshold - 90% Upper Prediction Interval (1.64 SD)
This model differs from the original Serfling regressions because of it's use of influenza virology to generate a counterfactual vs. using an off-season secular trend.
You can either report the deaths in excess of the one-sided 95% prediction interval, which are sometimes described as epidemics or severe periods of influenza morbidity and mortality.
df_excess <- fludiff(fludta_fitted, obsvar=perc_fludeaths, fitvar=y0_ul) df_excess %>% head(.)
ggplot(df_excess, aes(x=yrweek_dt)) + geom_line(aes(y=y_diff, colour="Epidemic mortality"), size=0.8, linetype=2) + geom_line(aes(y=perc_fludeaths, colour="Reported mortality"), size=0.8, linetype=1) + scale_x_date(labels = date_format("%Y"), date_breaks="1 year", expand=c(0, .9)) + scale_colour_manual("Line", values = c("Epidemic mortality"="#CC0000", "Reported mortality"="black")) + xlab("Year") + ylab("Deaths per 100,000") + theme_light(base_size=16) + theme(plot.title = element_text(size=14)) + labs(title="Figure 3. Periods of influenza epidemics over time")
Alternatively, you can simply take the excess deaths above the predicted line and report these as deaths attributable to influenza. Since they are in excess of the predicted deaths in the absence of influenza (i.e. no influenza positive isolates).
df_excess <- fludiff(fludta_fitted, obsvar=perc_fludeaths, fitvar=y0) df_excess %>% head(.)
ggplot(df_excess, aes(x=yrweek_dt)) + geom_line(aes(y=y_diff, colour="Excess mortality"), size=0.8, linetype=2) + geom_line(aes(y=perc_fludeaths, colour="Reported mortality"), size=0.8, linetype=1) + scale_x_date(labels = date_format("%Y"), date_breaks="1 year", expand=c(0, .9)) + scale_colour_manual("Line", values = c("Excess mortality"="#CC0000", "Reported mortality"="black")) + xlab("Year") + ylab("% Deaths from P&I") + theme_light(base_size=16) + theme(plot.title = element_text(size=14)) + labs(title="Figure 4. Periods of excess mortality over time")
fludta_mod <- flum(df_cdc_2, model="ird", outc=perc_fludeaths, time=yrweek_dt, viral=prop_flupos) head(fludta_mod)
fludta_mod <- flum(df_cdc_2, model="fluserf", outc=perc_fludeaths, time=yrweek_dt) fludta_mod %>% select(year, week, perc_fludeaths, y0, y0_ul) %>% head()
You can specify virology data using viral option, which accepts a string vector of length any. The function will search for named columns in the data using that string and build a model formula which includes them.
Any viral data called here should be a proportional value, i.e. float between 0-1.
fludta_mod %>% dplyr::filter(!is.na(prop_flupos)) %>% flum(., model="fluglm", outc=fludeaths, time=t, viral = "prop_flupos") %>% head(.)
If excluding the polynomial terms is desired:
## Without polynomial terms flum(df_cdc_2, model="fluglm", outc=fludeaths, time=week_in_order, viral = "prop_flupos", poly=F)
Perform a seasonal baseline model using Sept - May as season-period.
## Epidemic period, non-specified flum(df_cdc_2, model="fluglm", outc=fludeaths, time=t, season=T)
Identify your own epidemic period, then call flum() to use it. Season is default NULL, but if a named variable in fludta_mod, the model will search for and use it as the baseline.
## Epidemic period specified fludta_mod <- ird(data=df_cdc_2, outc = perc_fludeaths, viral=prop_flupos, time=t) flum(data=fludta_mod, model="fluglm", outc=fludeaths, time=t, season=high) %>% head(.)
You can pass arguments to the glm() call, like specifying a Poisson model.
## Poisson model with offset term flum(df_cdc_2, model="fluglm", outc = fludeaths, time = t, season=T, family=poisson, offset=log(alldeaths)) %>% head(.)
A negative binomial model can be called from the MASS::glm.nb function using
glmnb=T
option. The output is the same.
## Negative binomial model with offset term flum(df_cdc_2, model="fluglm", outc = fludeaths, time = t, viral='prop_flupos', glmnb = T) %>% head(.)
sessioninfo::session_info()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.