knitr::opts_chunk$set(message = FALSE, warning = FALSE, error = FALSE, fig.align = "center", dev.args = list(pointsize = 10))
options(digits = 4) # for more compact numerical outputs library("HIDDA.forecasting") library("ggplot2")
In this vignette, we use modelling and forecasting methods provided by:
library("surveillance")
The corresponding software reference is:
cat("<blockquote>") print(citation(package = "surveillance", auto = TRUE), style = "html") cat("</blockquote>\n")
We use age-stratified norovirus surveillance data from Berlin, Germany, together with an age-structured social contact matrix from the POLYMOD survey, as provided in the R package hhh4contacts:
library("hhh4contacts")
These data have been originally analyzed in:
Meyer S and Held L (2017): "Incorporating social contact data in spatio-temporal models for infectious disease spread". Biostatistics, 18(2), pp. 338--351. DOI: 10.1093/biostatistics/kxw051
Here we only consider models for spatially aggregated counts, i.e., without additional stratification by city district. More specifically, we will analyse age-stratified weekly counts $Y_{gt}$ in a similar way as for the reference model 6 in:
Held L, Meyer S and Bracher J (2017): "Probabilistic forecasting in infectious disease epidemiology: the 13th Armitage lecture". Statistics in Medicine, 36(22), pp. 3443--3460. DOI: 10.1002/sim.7363
BNV <- noroBE(by = "agegroup", agegroups = c(1, 2, 2, 4, 4, 2), timeRange = c("2011-w27", "2016-w26")) BNV (NGROUPS <- ncol(BNV)) (GROUPS <- colnames(BNV))
We can plot the observed age-stratified counts using the default plot method for "sts" objects:
plot(BNV)
We will use the first four seasons, from week 2011/27 to week 2015/26, as training data, and assess forecasts during the following year (2015/27 to 2016/26).
TRAIN <- 2:(4*52) TEST <- max(TRAIN) + 1:52
There is also an autoplot
variant based on ggplot2, which we use to
plot the age-specific incidence based on the population fractions
contained in the BNV
object,
(popfracsBE <- population(BNV)[1,])
and Berlin's total population,
(popBE <- sum(pop2011)) # also from the "hhh4contacts" package, see ?pop2011
as follows:
autoplot(BNV, population = 100000/popBE) + ## population: divides observed(BNV) by population(BNV)/(100000/popBE) ## Mod 1: highlight the Christmas period in each year geom_col(aes(fill = epochInYear %in% c(52, 1)), width = 7, show.legend = FALSE) + scale_fill_manual(values = c("black", "red")) + ## Mod 2: separate training and test periods by a vertical line geom_vline(aes(xintercept = as.numeric(date)[4*52] + .5), linetype = 2) + ylab("Incidence [per 100 000 inhabitants]") + scale_y_sqrt()
The neighbourhood
slot of the BNV
object contains a social
contact matrix derived from the German subset of the
POLYMOD survey,
aggregated to match the age groups of the surveillance data:
neighbourhood(BNV)
Each entry gives the average number of contact persons of a certain age group a participant (of a certain age group) reports on a randomly assigned day, see Mossong et al. (2008, PLoS Medicine, DOI: 10.1371/journal.pmed.0050074). The "agedistri" attribute gives the age distribution of the participants.
We will employ an improved estimate of this contact matrix, which ensures reciprocity on the population level, i.e., the overall number of contacts of age group i with age group j should be the same as vice versa, see Wallinga et al (2006, American Journal of Epidemiology, DOI: 10.1093/aje/kwj317):
(C_reci <- contactmatrix(which = "reciprocal", grouping = c(1, 2, 2, 4, 4, 2)))
We can check reciprocity with respect to Berlin's age distribution (also given in the "agedistri" attribute):
is_reciprocal <- function (C, population, tol = 0.001) { Cpop <- C * population all.equal(Cpop, t(Cpop), tolerance = tol, check.attributes = FALSE) } stopifnot(is_reciprocal(C_reci, popfracsBE))
The hhh4contacts package provides a simple plotting function for such contact matrices:
plotC(C_reci, scales = list(cex = 0.8, x = list(rot = 45)))
NB: A general implementation to extract social contact matrices from surveys, including from POLYMOD, is available via the dedicated R package socialmixr, and I recommend to use that package in future projects.
C <- contactmatrix(grouping = c(1, 2, 2, 4, 4, 2)) ## reproduce that matrix using socialmixr's POLYMOD data and contact_matrix() library("socialmixr") data("polymod") cite(polymod) C2 <- contact_matrix(polymod, countries = "Germany", age.limits = c(0,5,15,25,45,65), missing.participant.age = "remove", missing.contact.age = "keep") round(C2$matrix, 2) round(C, 2) stopifnot(all.equal(C2$matrix[,-ncol(C2$matrix)], C, check.attributes = FALSE)) ## reciprocity is obtained differently in socialmixr polymod2 <- polymod ## remove contacts with unknown age for same data approach polymod2$contacts <- subset(polymod2$contacts, !is.na(cnt_age_exact) | (!is.na(cnt_age_est_min) & !is.na(cnt_age_est_max))) C2_reci <- contact_matrix(polymod2, countries = "Germany", age.limits = c(0,5,15,25,45,65), missing.participant.age = "remove", #missing.contact.age = "keep", survey.pop = data.frame(lower.age.limit = c(0,5,15,25,45,65), population = popfracsBE * popBE), symmetric = TRUE) round(C2_reci$matrix, 2) stopifnot(is_reciprocal(C2_reci$matrix, popfracsBE)) round(C_reci, 2)
Given the counts from the previous week, $Y_{.,t-1}$,
we assume $Y_{gt}$ to follow a negative binomial distribution with a
group-specific overdispersion parameter and mean
$$
\mu_{gt} = \nu_{gt} + \phi_{gt} \sum_{g'} c_{g'g} Y_{g',t-1} .
$$
The endemic log-linear predictor $\nu_{gt}$ contains group-specific
intercepts, a Christmas effect (via a simple indicator for the calendar
weeks 52 and 1), and group-specific seasonal effects of $\sin(\omega t)$
and $\cos(\omega t)$ terms, $\omega=2\pi/52$.
The epidemic log-linear predictor $\phi_{gt}$ also contains group-specific
intercept, but shared seasonality and no Christmas effect.
For the contact matrix we use C_reci
from above, normalized to a
transition matrix C_reci/rowSums(C_reci)
, and compare this to models
assuming homogeneous or no mixing between age groups, and a model where we
estimate a power transformation $C^\kappa$ via profile likelihood
(hhh4contacts::fitC()
) as in Meyer and Held (2017, see
demo("hhh4contacts")
for a spatially disaggregated version of the
model).
DATAt <- list(t = epoch(BNV) - 1, christmas = as.integer(epochInYear(BNV) %in% c(52, 1))) mg_Creci <- hhh4(BNV, list( end = list(f = addSeason2formula(~0 + fe(1, unitSpecific = TRUE) + christmas, S = rep(1, NGROUPS))), ne = list(f = addSeason2formula(~0 + fe(1, unitSpecific = TRUE)), weights = matrix(1, NGROUPS, NGROUPS), scale = C_reci, normalize = TRUE), family = "NegBinM", data = DATAt, subset = TRAIN))
Alternative 1: assuming homogeneous mixing between age groups
mg_Chom <- update(mg_Creci, ne = list(scale = NULL))
Alternative 2: assuming no mixing between age groups
mg_Cdiag <- update(mg_Creci, ne = list(scale = diag(NGROUPS)))
Alternative 3: with a power transformation of the contact matrix
mg_Cpower <- fitC(mg_Creci, C_reci, normalize = TRUE, truncate = TRUE)
Simple AIC comparison of the model fits to the training period:
AIC(mg_Creci, mg_Chom, mg_Cdiag, mg_Cpower)
Parameter estimates from the model with power-adjusted contact matrix:
summary(mg_Cpower, maxEV = TRUE, reparamPsi = TRUE, amplitudeShift = TRUE, idx2Exp = TRUE)
## plot estimated endemic-epidemic decomposition ##plot(mg_Cpower, units = NULL, pch = 20, ## legend = 2, legend.args = list(legend = c("epidemic", "endemic"))) ## additional decomposition into AR effects and effects of other age groups plotHHH4_fitted_groups(mg_Cpower, groups = GROUPS, units = NULL, pch = 20, legend = 2, legend.args = list(legend = c("from other groups", "within group", "endemic")))
par(mfrow = c(2,1), mar = c(0,5,1,1), las = 1) plotHHH4_season_groups(mg_Cpower, component = "end", seasonStart = 27, col = c("#D53E4F", "#FC8D59", "#FEE08B", "#E6F598", "#99D594", "#3288BD"), conf.level = NULL, xaxt = "n", xlab = "", ylim = c(0, 5), yaxs = "i", ylab = "endemic seasonal effects") par(mar = c(3,5,1,1)) with(data.frame(time = epochInYear(BNV) + (year(BNV)-2011)*52, maxEV = getMaxEV(mg_Cpower))[1:52,], plot(maxEV ~ time, ylim = c(0, 1), type = "l", lwd = 3, yaxs = "i", xlab = "", ylab = "epidemic proportion", panel.first = quote(abline(v=52.5, lty=3))))
Parameters are updated sequentially in the one-step-ahead procedure. However, the power parameter of C is held fixed at the initial estimate to reduce the runtime.
For each model, we compute the r length(TEST)
"rolling" one-week-ahead
forecasts during the last season. This takes roughly 4 seconds per model
(we could parallelize using the cores
argument of oneStepAhead()
).
fits <- list("reciprocal" = mg_Creci, "homogeneous" = mg_Chom, "no mixing" = mg_Cdiag, "power-adjusted" = mg_Cpower)
owas <- lapply(fits, oneStepAhead, tp = range(TEST)-1, type = "rolling", which.start = "final", verbose = FALSE) save(owas, file = "BNV_owa.RData")
load("BNV_owa.RData")
par(mfrow = c(2,2), mar = c(5,5,1,1), las = 1) mapply(function(x, main) pit(x, plot = list(main = main, ylab = "Density")), x = owas, main = names(owas))
unlist(sapply(owas, calibrationTest, which = "dss")["p.value",]) unlist(sapply(owas, calibrationTest, which = "logs")["p.value",])
owas_scores <- lapply(owas, scores, which = c("dss", "logs"), individual = TRUE, reverse = FALSE) lapply(owas_scores, function(x) cbind(apply(x, 3:2, mean), overall=apply(x, 3, mean)))
set.seed(1) sapply(c("dss", "logs"), function (score) permutationTest(owas_scores[["reciprocal"]][,,score], owas_scores[["power-adjusted"]][,,score], nPermutation = 999))
par(las = 1, mar = c(0,4,1,0)+.5) owas_quantiles <- lapply(owas, quantile, probs = 1:99/100) .osaplot <- function (model) { sapply(seq_along(GROUPS), function (group) { osaplot(quantiles = owas_quantiles[[model]][,group,], probs = 1:99/100, observed = owas[[model]]$observed[,group], scores = owas_scores[[model]][,group,], start = TEST[1], xlab = "", main = GROUPS[group], fan.args = list(ln = c(0.1,0.9), rlab = NULL), key.args = if(group==length(GROUPS)) list(start=max(TEST)-2, rcex=0.6), scores.args = list(xaxt = "n", ylim = range(owas_scores[[model]]), lab = c(7,3,0), panel.first = grid(nx=NA,ny=NULL,lty=1)), legend.args = if(group==length(GROUPS)) list()) }) } .osaplot("no mixing")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.