farringtonFlexible: Surveillance for an univariate count data time series using...
In jimhester/surveillance: Temporal and Spatio-Temporal Modeling and Monitoring of Epidemic Phenomena

Description Usage Arguments Details Value Author(s) References See Also Examples

The function takes range values of the surveillance time series sts and for each time point uses a Poisson GLM with overdispersion to predict an upper bound on the number of counts according to the procedure by Farrington et al. (1996) and by Noufaily et al. (2012). This bound is then compared to the observed number of counts. If the observation is above the bound, then an alarm is raised.

farringtonFlexible(sts, control = list(range = NULL, b = 3, w = 3,
                                      reweight = TRUE, 
                                      weightsThreshold = 2.58,
                                      verbose = FALSE,glmWarnings = TRUE,
                                      alpha = 0.01, trend = TRUE,
                                      pThresholdTrend = 0.05, limit54=c(5,4), 
                                      powertrans="2/3",
                                      fitFun="algo.farrington.fitGLM.flexible",
                                      populationOffset = FALSE, 
                                      noPeriods = 1, pastWeeksNotIncluded = 26, 
                                      thresholdMethod = "delta"))

sts

object of class sts (including the observed and the state time series)

control

Control object given as a list containing the following components:

range: Specifies the index of all timepoints which should be tested. If range is NULL all possible timepoints are used.
b: How many years back in time to include when forming the base counts.
w: Window's half-size, i.e. number of weeks to include before and after the current week in each year.
reweight: Boolean specifying whether to perform reweighting step.
weightsThreshold: Defines the threshold for reweighting past outbreaks using the Anscombe residuals (1 in the original method, 2.58 advised in the improved method).
verbose: Boolean specifying whether to show extra debugging information.
glmWarnings: Boolean specifying whether to print warnings from the call to glm.
alpha: An approximate (one-sided) (1-α)\cdot 100\% prediction interval is calculated unlike the original method where it was a two-sided interval. The upper limit of this interval i.e. the (1-α)\cdot 100\% quantile serves as an upperbound.
trend: Boolean indicating whether a trend should be included and kept in case the conditions in the Farrington et. al. paper are met (see the results). If false then NO trend is fit.
pThresholdTrend: Threshold for deciding whether to keep trend in the model (0.05 in the original method, 1 advised in the improved method).
limit54: Vector containing two numbers: cases and period. To avoid alarms in cases where the time series only has about almost no cases in the specific week the algorithm uses the following heuristic criterion (see Section 3.8 of the Farrington paper) to protect against low counts: no alarm is sounded if fewer than \code{cases}=5 reports were received in the past \code{period}=4 weeks. limit54=c(cases,period) is a vector allowing the user to change these numbers. Note: As of version 0.9-7 of the package the term "last" period of weeks includes the current week - otherwise no alarm is sounded for horrible large numbers if the four weeks before that are too low.
powertrans: Power transformation to apply to the data if the threshold is to be computed with the method described in Farrington et al. (1996. Use either "2/3" for skewness correction (Default), "1/2" for variance stabilizing transformation or "none" for no transformation.
fitFun: String containing the name of the fit function to be used for fitting the GLM. The only current option is "algo.farrington.fitGLM.flexible".
populationOffset: Boolean specifying whether to include a population offset in the GLM. The slot sts@population gives the population vector.
noPeriods: Number of levels in the factor allowing to use more baseline. If equal to 1 no factor variable is created, the set of reference values is defined as in Farrington et al (1996).
pastWeeksNotIncluded: Number of past weeks to ignore in the calculation.
thresholdMethod: Method to be used to derive the upperbound. Options are "delta" for the method described in Farrington et al. (1996), "Noufaily" for the method described in Noufaily et al. (2012), and "muan" for the method extended from Noufaily et al. (2012).

The following steps are perfomed according to the Farrington et al. (1996) paper.

Fit of the initial model with intercept, time trend if trend is TRUE, seasonal factor variable if noPeriod is bigger than 1, and population offset if populationOffset is TRUE. Initial estimation of mean and overdispersion.
Calculation of the weights omega (correction for past outbreaks) if reweighting is TRUE. The threshold for reweighting is defined in control.
Refitting of the model
Revised estimation of overdispersion
Omission of the trend, if it is not significant
Repetition of the whole procedure
Calculation of the threshold value using the model to compute a quantile of the predictive distribution. The method used depends on thresholdMethod, this can either be:

"delta"
One assumes that the prediction error (or a transformation of the prediction error, depending on powertrans), is normally distributed. The threshold is deduced from a quantile of this normal distribution using the variance and estimate of the expected count given by GLM, and the delta rule. The procedure takes into account both the estimation error (variance of the estimator of the expected count in the glm) and the prediction error (variance of the prediction error). This is the suggestion in Farrington et al. (1996).

"nbPlugin"
One assumes that the new count follows a negative binomial distribution parameterized by the expected count and the overdispersion estimated in the GLM. The threshold is deduced from a quantile of this discrete distribution. This process disregards the estimation error, though. This method was used in Noufaily, et al. (2012).

"muan"
One also uses the assumption of the negative binomial sampling distribution but does not plug in the estimate of the expected count from the GLM, instead one uses a quantile from the asymptotic normal distribution of the expected count estimated in the GLM; in order to take into account both the estimation error and the prediction error.
Computation of exceedance score

Warning: monthly data containing the last day of each month as date should be analysed with epochAsDate=FALSE in the sts object. Otherwise February makes it impossible to find some reference time points.

An object of class sts with the slots upperbound and alarm filled by appropriate output of the algorithm. The slot control is the usual list but with more items all of length length(range):

trend: is a vector of Booleans indicating whether a time trend was fitted for this time point.
trendVector: is a vector giving the coefficient of the time trend in the GLM for this time point. If no trend was fitted it is equal to NA.
pvalue: is a vector giving the probability of observing a value at least equal to the observation under the null hypothesis .
expected: is a vector giving the expectation of the predictive distribution for each timepoint. It is only reported if the conditions for raising an alarm are met, e.g enoughCases.
mu0Vector: is a vector giving what is inputed in the negative binomial distribution to get the upperbound as a quantile (either a plug-in from the glm or a quantile from the asymptotic normal distribution of the estimator)
phiVector: is a vector giving the overdispersion of the glm at each timepoint.

M. Salmon, M. H?hle

A statistical algorithm for the early detection of outbreaks of infectious disease, Farrington, C.P., Andrews, N.J, Beale A.D. and Catchpole, M.A. (1996), J. R. Statist. Soc. A, 159, 547-563.

An improved algorithm for outbreak detection in multiple surveillance systems, Noufaily, A., Enki, D.G., Farrington, C.P., Garthwaite, P., Andrews, N.J., Charlett, A. (2012), Statistics in Medicine, published online.

algo.farrington.fitGLM,algo.farrington.threshold

### DATA I/O ###
#Read Salmonella Agona data
data("salmonella.agona")

# Create the corresponding sts object from the old disProg object
salm <- disProg2sts(salmonella.agona)

### RUN THE ALGORITHMS WITH TWO DIFFERENT SETS OF OPTIONS ###           
# Farrington with old options
control1 <-  list(range=(260:312),
						 noPeriods=1,populationOffset=FALSE,
						 fitFun="algo.farrington.fitGLM.flexible",
						 b=4,w=3,weightsThreshold=1,
						 pastWeeksNotIncluded=3,
						 pThresholdTrend=0.05,trend=TRUE,
						 thresholdMethod="delta",alpha=0.1)
control2<-   list(range=(260:312),
						 noPeriods=10,populationOffset=FALSE,
						 fitFun="algo.farrington.fitGLM.flexible",
						 b=4,w=3,weightsThreshold=2.58,
						 pastWeeksNotIncluded=26,
						 pThresholdTrend=1,trend=TRUE,
						 thresholdMethod="delta",alpha=0.1)
salm1 <- farringtonFlexible(salm,control=control1)
salm2 <- farringtonFlexible(salm,control=control2)

### PLOT THE RESULTS ###
y.max <- max(upperbound(salm1),observed(salm1),upperbound(salm2),na.rm=TRUE)
 
plot(salm1,ylim=c(0,y.max),
        main='S. Newport in Germany')
lines( 1:(nrow(salm1)+1)-0.5, 
       c(upperbound(salm1),upperbound(salm1)[nrow(salm1)]),
      type="s",col='tomato4',lwd=2)
lines( 1:(nrow(salm2)+1)-0.5, 
       c(upperbound(salm2),upperbound(salm2)[nrow(salm2)]),
      type="s",col="blueviolet",lwd=2)

legend(c(0,10),c('Alarm','Upperbound with old options','Upperbound with new options'),
      pch=c(24,NA,NA),lty=c(NA,1,1),
        bg="white",lwd=c(2,2,2),col=c('red','tomato4',"blueviolet"))