cusum_synd-methods: 'cusum_synd'

cusum_syndR Documentation

cusum_synd

Description

This function applies the cusum() algorithm (available in the qcc R package. Here it is employed as part of an iterative process to allow detection of outbreak signals. The additional features compared to the regular cusum() algorithm are:

  • pre-processing: Instead of applying cusum directly to the time-series, it is possible to choose one of two pre-processing methods: (1) modeling and removing temporal effects with a GLM regression model (families "poisson","nbinom" or "gaussian"); (2) differencing to remove for instance day-of-week effects. The user can of course also set pre-processing to FALSE, and apply no temporal effects removal to the data.

  • iterative application: the algorithm is applied to a range of time points in an iterative manner, so if syndromic data needs to be evaluated for the past 30 days, for instance, the function is called once and the internal loops evaluate one day at a time.

  • Detection of deviations one day at a time: in this implementation rather than running the algorithm to multiple time units in a "batch", it applies the algorithm one time unit (e.g., day) at a time, so that aberrations detected in any given time unit can be corrected, before proceeding to the next. The correction of aberrations can be performed using this algorithm, or if the time series has already been corrected using another algorithm (with results saved in the slot baseline of the syndromic object being analysed), the corrected baseline will always considered as trainig data, rather than the observed data (which may contain aberrations)

  • guard-band: The user can set a guard-band between the time unit being evaluated and the start of the window used as training data, in order to avoid contamination of the baseline with undetected outbreak-signals.

  • recording of the detection limits: that is already a feature of the cusum() function, and in the syndromic application the LCL and UCL limits are stored in the appropriate slot of the object syndromic. The main innovation here is that if pre-processing methods are being used, the LCL and UCL are recorded after transformation of the values back to the scale of the original data, rathee than being recorded in the scale of the residuals of pre-processing, which are the actual values used by the control-chart method.

  • data correction: in case an observation is found to be greater than the confidence interval of the forecast, the user can choose to update the outbreak-free baseline by substituting the observed value with the UCL value. As mentioned before, this feature should not be used if the baseline was already constructed using another algorithm

  • multiple limits: the user can apply the algorithm with multiple detection limits - that is to say, different confidence intervals

Usage

cusum_synd(x, ...)

## S4 method for signature 'syndromicD'
cusum_synd(x, syndromes = NULL, evaluate.window = 1,
  baseline.window = 365, limit.sd = c(2.5, 3, 3.5), guard.band = 7,
  correct.baseline = FALSE, alarm.dim = 4, UCL = 1, LCL = FALSE,
  pre.process = FALSE, diff.window = 7, family = "poisson",
  formula = NULL, frequency = 365, se.shift = 1)

## S4 method for signature 'syndromicW'
cusum_synd(x, syndromes = NULL, evaluate.window = 1,
  baseline.window = 52, limit.sd = c(2.5, 3, 3.5), guard.band = 2,
  correct.baseline = FALSE, alarm.dim = 4, UCL = 1, LCL = FALSE,
  pre.process = FALSE, diff.window = 4, family = "poisson",
  formula = NULL, frequency = 52, se.shift = 1)

Arguments

x

a syndromic (syndromicD or syndromicW) object. If pre-processing using regression is going to be used, the slot dates must contain a data.frame containing at least the columns for the regression variables chosen to be used (year, dow, month).

...

Additional arguments to the method.

syndromes

an optional parameter, if not specified, all columns in the slot observed of the syndromic object will be used. The user can choose to restrict the analyses to a few syndromic groups listing their name or column position in the observed matrix. See examples.

evaluate.window

the number of time points to be evaluated. By default only the last time point is evaluated, but the user can set any window (as long as the number of time points in the time series allows so).

baseline.window

The baseline used to train the algorithm in order to provide a forecast, which will serve to decide whether the current observed data is expected. Normally 1-2 years.

limit.sd

The limit of detection to be used, that is, the cut-off of the confidence interval that decides when an observed is abnormal, provided in number of standard deviations. This can be provided as a single value or as a vector. When a vector is provided, multiple detection results are given, and the alarm result stored is a sum of how many detection limits were met.

guard.band

The number of time units used to separate the current time unit evaluated and the baseline window. The default is 7 (assuming weekly data). If zero or FALSE, will be converted to ONE, which is the minimal separation between the current time point, and the historical data.

correct.baseline

besides detecting abnormal observations, the algorithm can also be used to correct the data, removing these observations and substituting them by the limit of the confidence interval of the prediction provided by the cusum() algorithm. If that is to be carried out, the user needs to specify which of the provided limits is to be used for the correction. This variable should be filled with the INDEX of the limit to be used. For example, if limit.sd was provided as "c(2.5,3.0,3.5)", the use of correct.baseline=1 will cause the algorithm to substitute any observations above 2.5 standard deviations from the predicted value with this exact cut-off limit. If using correct.baseline=2, only observations above a standard deviation of 3 (limit.sd[2]) will be corrected. To avoid that a baseline is generated or modified, set this argument to zero or NULL.

alarm.dim

The syndromic object is set to accept the result of multiple detection algorithms. These results are stored as a third dimension in the slot alarms. Here the user can choose which order in that dimension should store the results of this algorithm.

UCL

the minimum number that would have geerated an alarm, for every time point, can be recorded in the slot UCL of the syndromic object.The user must provide the INDEX in the limit.sd vector for which the UCL values should be corrected (as explained for the argument correct.baseline above). Set to FALSE to prevent the recording.

LCL

default is FALSE. If set to an index of limit.sd (see UCL above) then alarms are also generated when the observed number of events is LOWER than expected, and the maximum number of observations that would have generated a low alarm is recorded in the slot LCL. In this case alarms are recorded as -1 for each detection limit met.

pre.process

whether to pre-process the time series in order to remove temporal effects before applying the control-chart. Set to FALSE to apply the control-chart to the original, observed time series, using the data in the slot baseline as training (if the slot is empty, observed data will be copied into it). Set to "diff" to apply simple differencing. Set to "glm" to apply a regression model and deliver only the residuals to the control-chart. The next arguments set details of either method. You can provide pre.process as a single value - for instance pre.process="diff" will apply differentiation to ALL syndromes being evaluated. Or you can provide it as a vector - for instance pre.process=c(FALSE,"diff",FALSE,"glm","glm") will not apply any pre-processing to the 1st and 3rd syndromes in the syndromic object, differentiation to the second, and regression to syndromes 4 and 5. PLEASE NOTE that even if you are evaluating only a few of the syndromes, you need to provide pre.process as either a single value, or as a vector WITH SAME LENGTH AS THE NUMBER OF SYNDROMES IN THE SYNDROMIC OBJECT, even if not all will be evaluated (you can just use FALSE for those not being evaluated, for instance).

diff.window

only relevant if "pre.process" is set to "diff". Corresponds to the number of time units of differencing, default is 7 (weekly differencing). Change to 5 if weekends do not contain weekend days.

family

when using pre-processing using glm, the GLM distribution family used, by default "poisson". if "nbinom" is used, the function glm.nb is used instead.

formula

a formula can be provided if you want to OVERRIDE the formula saved in the syndromic object, but the recommended use of the aberration detection algorithms is to have already carried out an evaluation of your time series, and saved the appropriate pre-processing formulas as a list in the slot @formula using syndromic.object@formula <- list(formula1, formula2....). If you still wish to provide formulas as a direct argument to the function, make sure to provide as a list. You can get more details and examples on providing regression formulas in the help for the function pre_process_glm (?pre_process_glm).

frequency

in case pre-processing is applied using "glm" AND the sin/cos functions are used, the cycle of repetitions need to be set. The default is one year ( 365 days or 52 weeks).

se.shift

a parameter to be passed to the cusum function used internally (originally from the qcc package).From the documentation of that package: "The amount of shift to detect in the process, measured in standard errors of the summary statistics". The default is set to 1.

Value

An object of the class syndromic which contains all elements from the object provided in x, but in which the slot alarm has been filled in the following way: for the rows assigned in evaluate.window, columns indicated in syndromes (or all columns from observed if syndromes is left undefined), and for the third dimension specified in alarm.dim (1 by default), zeros have been assigned if no alarm was generated; otherwise a numerical value gives the number of alarms detected. That is, how many of the limits given in limits.sd detected an abnormal observation. See examples. If the user sets a correct.baseline value, the baseline will also have been modified.

See Also

pre_process_glm

ewma_synd

holt_winters_synd

shew_synd

Examples

 ## Examples DAILY data
data(lab.daily)
my.syndromicD <- raw_to_syndromicD (id=SubmissionID,
                                   syndromes.var=Syndrome,
                                   dates.var=DateofSubmission,
                                   date.format="%d/%m/%Y",
                                   remove.dow=c(6,0),
                                   add.to=c(2,1),
                                   data=lab.daily)
my.syndromicD <- cusum_synd(x=my.syndromicD,
                           syndromes="Musculoskeletal",
                           evaluate.window=30,
                           baseline.window=260,
                           limit.sd=c(2.5,3,3.5),
                           guard.band=5,
                           correct.baseline=FALSE,
                           alarm.dim=4,
                           pre.process="glm",
                           family="poisson",
                           formula=list(days~dow+sin+cos+AR1+AR2+AR3+AR4+AR5),
                           frequency=260)

my.syndromicD@formula <- list(NA,days~dow+sin+cos+AR1+AR2+AR3+AR4+AR5,
                             days~dow+sin+cos+AR1+AR2+AR3+AR4+AR5,NA,NA)

my.syndromicD <- cusum_synd(x=my.syndromicD,
                           syndromes= c(1,2,4,5),
                           evaluate.window=30,
                           baseline.window=260,
                           limit.sd=c(2.5,3,3.5),
                           guard.band=5,
                           correct.baseline=FALSE,
                           alarm.dim=4,
                           pre.process=c(FALSE,"glm","glm","diff","diff"),
                           diff.window=5,
                           frequency=260)                      
                           
## Examples WEEKLY data
data(lab.daily)
my.syndromicW <- raw_to_syndromicW (id=SubmissionID,
                                    syndromes.var=Syndrome,
                                    dates.var=DateofSubmission,
                                    date.format="%d/%m/%Y",
                                    data=lab.daily)
my.syndromicW <- cusum_synd(x=my.syndromicW,
                         syndromes="Musculoskeletal",
                         evaluate.window=10,
                         baseline.window=104,
                         limit.sd=c(2.5,3,3.5),
                         guard.band=2,
                         correct.baseline=FALSE,
                         alarm.dim=4,
                         pre.process="glm",
                         family="nbinom",
                         formula=list(week~trend+sin+cos),
                         frequency=52)
                         
my.syndromicW@formula <- list(NA,week~trend+sin+cos,
                             week~trend+sin+cos,NA,NA)

my.syndromicW <- cusum_synd(x=my.syndromicW,
                          syndromes= c(1,2,4,5),
                          evaluate.window=10,
                          baseline.window=104,
                          limit.sd=c(2.5,3,3.5),
                          guard.band=2,
                          correct.baseline=FALSE,
                          alarm.dim=4,
                          pre.process=c(FALSE,"glm","glm","diff","diff"),
                          diff.window=4)

nandadorea/vetsyn documentation built on April 30, 2022, 1:15 a.m.