knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(tibble.print_min = 4L, tibble.print_max = 4L)
library(flumodelr)
library(lubridate)
library(scales)

Modern Applications of Serfling Models

The cyclical regression model published by Serfling [@Serlfing1963] is not commonly implemented in its original form.

Virology data

In current methods, most researchers incorporate data on the percent of isolates positive with influenza viral types. This is publically available in many cases, see ?nrevss. The data describes the proportion of isolates tested, which may be obtained from participating hospital and outpatient settings. Using this surrogate time-periods of low flu activity can be used to construct a counterfactual for what the rate of the reported influenza outcome would be in the absence of high influenza activity. Whereas the original model relied on "off-season" data and the Fourier term, virology data gives you a more direct measure. This ultimately provides increased confidence that deaths are influenza related and estimated influenza morbidity and mortality.

The example dataset for this package, flumodelr::fludta is a combination of the CDC-122 cities dataset, ?cdc122city and the public virology data ?nrevss. The virology is matched by week, with a lag of 2 weeks to allow for reporting of deaths (i.e. Virology(week_t-2)=Mortality(week_t)).

df_cdc <- flumodelr::fludta %>%
  dplyr::filter(year>=2010 & year<2016) 
ggplot(df_cdc, aes(x=yrweek_dt)) + 
  geom_line(aes(y=perc_fludeaths, colour="% Deaths from P&I", 
                linetype="% Deaths from P&I"), size=0.8) +
  geom_line(aes(y=prop_flupos*10, colour="No. Positive per 10 isolates", 
                linetype="No. Positive per 10 isolates"), size=0.8) +
  scale_colour_manual("Line",
                      breaks=c("% Deaths from P&I", 
                                 "No. Positive per 10 isolates"),
                      values = c("% Deaths from P&I"="#CC0000", 
                                 "No. Positive per 10 isolates"="black")) +
  scale_linetype_manual("Line", 
                        breaks=c("% Deaths from P&I", 
                                 "No. Positive per 10 isolates"),
                        values = c("% Deaths from P&I"=1,
                                   "No. Positive per 10 isolates"=2)) +
  scale_x_date(labels = date_format("%Y"), date_breaks="1 year",
               expand=c(0, .9)) +
  xlab("Year") + ylab("%") +
  theme_light(base_size=16) +
  theme(plot.title = element_text(size=14)) +
  labs(title="Figure 1. Influenza (+) isolates over time")

The original model can be modified like so:

$$Eq \ 1. \ y = \alpha_0 + \beta_1t + \beta_2Flu_t + sin(\frac{2 \pi t}{period}) + cos(\frac{2 \pi t}{period}) + u$$ Where Flu = no. of isolates positive for influenza / total isolates tested in a given timepoint t. period is a fixed parameter equal to the cycle of the time unit (e.g. 52 weeks in a year).

Many published examples of this exist [@Wang2012, @Matias2016]. Some authors include data on the % RSV (+), as well as breakdown influenza by subtype and even include data on weather, humidity etc. Including individual terms for each viral subtype may have advantages as certain types of influenza are associated with more severe outcomes such as hospitalization and death.

flum()

A general wrapper, flum(), for modeling time-series data is written for this purpose. This will compute a traditional serfling model or other generalized linear models, a smoothed model (adds polynomial terms), a model which allows for virology data, and other options explored below.

Influenza data

head(df_cdc)

General Estimation Procedure (performed by flum())

add Fourier term

df_cdc_2 <- df_cdc %>%
  mutate(t = row_number(), #set origin to october
         theta = 2*t / 52,
         sin_2 = sinpi(theta),
         cos_2 = cospi(theta))

See Serfling model background for discussion on Fourier terms and cyclical regression.

add polynomials

df_cdc_2 <- df_cdc_2 %>%
  mutate(week_2 = t^2,
         week_3 = t^3,
         week_4 = t^4,
         week_5 = t^5)

Estimate coefficients for baseline model

base_fit <- df_cdc_2 %>%
  lm(perc_fludeaths ~ t + week_2 + week_3 + week_4 + week_5 +
       prop_flupos + sin_2 + cos_2, data=., na.action = na.exclude)
summary(base_fit)

Predict outcome values, assuming no influenza activity

Here we predict the outcome for each observation, given the fitted model base_fit above. We tell R to compute 95% prediction intervals. This is conventional for most modern approaches, see [insert cites]. The original Serfling paper estimated a threshold 1.64 standard deviations above the trend line.

base_pred <- df_cdc_2 %>%
  mutate(prop_flupos = 0) %>% #Note setting to zero
  predict(base_fit, newdata=., se.fit=TRUE, 
          interval="prediction", level=0.90)

Critically, before predicting observations, we set the measure of our influenza activity, prop_flupos to zero. This is to estimate the influenza activity at each timepoint, assuming that the proportion of influenza isolates was equal to zero.

Add fitted values to dataset

fludta_fitted <- df_cdc_2 %>%
  add_column(., y0=base_pred$fit[,1], y0_ul=base_pred$fit[,3]) 
head(fludta_fitted)
#Set up graph labels, line specs
line_names <- c("% Deaths From P&I", "Expected %", "Epidemic Threshold")
line_cols <- c("#CC0000", "black", "black")
line_types <- c(1, 1, 2)
names(line_cols) <- line_names
names(line_types) <- line_names

ggplot(fludta_fitted, aes(x=yrweek_dt)) + 
  geom_line(aes(y=perc_fludeaths, colour=line_names[[1]], 
                linetype=line_names[[1]]), size=0.8) +
  geom_line(aes(y=y0, colour=line_names[[2]], 
                linetype=line_names[[2]]), size=0.8) +
  geom_line(aes(y=y0_ul, colour=line_names[[3]], 
                linetype=line_names[[3]]), size=0.8) +
  scale_colour_manual("Line", breaks=line_names, values = line_cols) +
  scale_linetype_manual("Line",  breaks=line_names, values = line_types) +
  scale_x_date(labels = date_format("%Y"), date_breaks="1 year",
               expand=c(0, .9)) + 
  xlab("Year") + 
  ylab("% of Deaths from P&I") + 
  theme_light(base_size=14) +
  theme(legend.text=element_text(size=10), 
        plot.title = element_text(size=14)) +
  labs(title="Figure 2. Pneumonia and Influenza Mortality",
       caption="Modified Serfling Model") +
  guides(colour = guide_legend("Line"), linetype = guide_legend("Line"))

% of Deaths from P&I - No. of influenza / pneumonia deaths per 100,000 people.
"Expected % - A fit of the cyclical regression model, (one Fourier term).
Epidemic Threshold - 90% Upper Prediction Interval (1.64 SD)

This model differs from the original Serfling regressions because of it's use of influenza virology to generate a counterfactual vs. using an off-season secular trend.

Compute attributable mortality

Excess mortality

You can either report the deaths in excess of the one-sided 95% prediction interval, which are sometimes described as epidemics or severe periods of influenza morbidity and mortality.

df_excess <- fludiff(fludta_fitted, obsvar=perc_fludeaths, fitvar=y0_ul)
df_excess %>%
  head(.)
ggplot(df_excess, aes(x=yrweek_dt)) + 
  geom_line(aes(y=y_diff, colour="Epidemic mortality"), size=0.8, linetype=2) +
  geom_line(aes(y=perc_fludeaths, colour="Reported mortality"), size=0.8, linetype=1) +
  scale_x_date(labels = date_format("%Y"), date_breaks="1 year",
               expand=c(0, .9)) + 
  scale_colour_manual("Line",
                      values = c("Epidemic mortality"="#CC0000", 
                                 "Reported mortality"="black")) +
  xlab("Year") + 
  ylab("Deaths per 100,000") + 
  theme_light(base_size=16) +
  theme(plot.title = element_text(size=14)) +
  labs(title="Figure 3. Periods of influenza epidemics over time")

Attributable influenza mortality

Alternatively, you can simply take the excess deaths above the predicted line and report these as deaths attributable to influenza. Since they are in excess of the predicted deaths in the absence of influenza (i.e. no influenza positive isolates).

df_excess <- fludiff(fludta_fitted, obsvar=perc_fludeaths, fitvar=y0)
df_excess %>%
  head(.)
ggplot(df_excess, aes(x=yrweek_dt)) + 
  geom_line(aes(y=y_diff, colour="Excess mortality"), size=0.8, linetype=2) +
  geom_line(aes(y=perc_fludeaths, colour="Reported mortality"), size=0.8, linetype=1) +
  scale_x_date(labels = date_format("%Y"), date_breaks="1 year",
               expand=c(0, .9)) + 
  scale_colour_manual("Line",
                      values = c("Excess mortality"="#CC0000", 
                                 "Reported mortality"="black")) +
  xlab("Year") + 
  ylab("% Deaths from P&I") + 
  theme_light(base_size=16) +
  theme(plot.title = element_text(size=14)) +
  labs(title="Figure 4. Periods of excess mortality over time")

Examples with flum()

Incidence-Rate Difference (Simplest model)

fludta_mod <- flum(df_cdc_2, model="ird", 
                   outc=perc_fludeaths, time=yrweek_dt,
                   viral=prop_flupos)

head(fludta_mod)

original serfling model

fludta_mod <- flum(df_cdc_2, model="fluserf", 
                   outc=perc_fludeaths, time=yrweek_dt)

fludta_mod %>% 
  select(year, week, perc_fludeaths, y0, y0_ul) %>%
  head()

Virology-based model

You can specify virology data using viral option, which accepts a string vector of length any. The function will search for named columns in the data using that string and build a model formula which includes them.

Any viral data called here should be a proportional value, i.e. float between 0-1.

fludta_mod %>%
  dplyr::filter(!is.na(prop_flupos)) %>%
flum(., model="fluglm", 
            outc=fludeaths, time=t, 
            viral = "prop_flupos") %>%
head(.)

Other options

If excluding the polynomial terms is desired:

## Without polynomial terms
flum(df_cdc_2, model="fluglm", 
     outc=fludeaths, time=week_in_order, 
     viral = "prop_flupos", poly=F)

Perform a seasonal baseline model using Sept - May as season-period.

## Epidemic period, non-specified
flum(df_cdc_2, model="fluglm", 
     outc=fludeaths, time=t, 
     season=T)

Identify your own epidemic period, then call flum() to use it. Season is default NULL, but if a named variable in fludta_mod, the model will search for and use it as the baseline.

## Epidemic period specified
fludta_mod <- ird(data=df_cdc_2, 
               outc = perc_fludeaths, viral=prop_flupos, time=t)

flum(data=fludta_mod, model="fluglm", outc=fludeaths, time=t, 
       season=high) %>%
  head(.)

You can pass arguments to the glm() call, like specifying a Poisson model.

## Poisson model with offset term
flum(df_cdc_2, 
     model="fluglm", outc = fludeaths, 
     time = t, season=T, 
     family=poisson, offset=log(alldeaths)) %>%
  head(.)

A negative binomial model can be called from the MASS::glm.nb function using glmnb=T option. The output is the same.

## Negative binomial model with offset term
flum(df_cdc_2, 
     model="fluglm", outc = fludeaths, 
     time = t, viral='prop_flupos',
     glmnb = T) %>%
head(.)

References

sessioninfo::session_info()


kmcconeghy/flumodelr documentation built on June 7, 2019, 8:47 p.m.