clean_baseline-methods: Function to retrospectively remove possible outbreak signals...

clean_baselineR Documentation

Function to retrospectively remove possible outbreak signals and excessive noise, producing an outbreak free baseline that will serve to train outbreak-signal detection algorithms during prospective analysis.

Description

The cleaning is based on fitting the complete time series using regression methods (by default Poisson regression, but any other glm family is accepted, extended to negative binomial using the package fitdistrplus), and then removing any observations that fall outside a given confidence interval (set by the user). These observations are substituted by the model prediction for that time point.

Usage

clean_baseline(x, ...)

## S4 method for signature 'syndromicD'
clean_baseline(x, syndromes = NULL,
  family = "poisson", limit = 0.95, formula = NULL, frequency = 365,
  plot = TRUE, print.model = TRUE)

## S4 method for signature 'syndromicW'
clean_baseline(x, syndromes = NULL,
  family = "poisson", limit = 0.95, formula = "year+sin+cos",
  plot = TRUE, print.model = TRUE, frequency = 52)

Arguments

x

a syndromic (syndromicD or syndromicW) object, which must have at least the slot of observed data and a data frame in the slot dates.

...

Additional arguments to the method.

syndromes

an optional parameter, if not specified, all columns in the slot observed of the syndromic object will be used. The user can choose to restrict the analyses to a few syndromic groups listing their name or column position in the observed matrix. See examples.

family

the GLM distribution family used, by default "poisson". if "nbinom" is used, the function glm.nb is used instead.

limit

the confidence interval to be used in identifying outliers.

formula

the regression formula to be used, in the R formula format: y~x1+x2... If none is provided, the function looks for formulas in the @formula slot of the syndromic object. If a formula is provided when this function is called, then that formula is used. We recommend providing a formula to test various models, but once a model is chosen, we recommend saving that formula in the syndromic object using: my.syndromic@formula <- list(formula1,formula2...), for as many syndromes as the syndromic object has (columns in observed). NA can be provided when a syndrome is not to be associated with a particular formula. Any variables (x1, x2...) must be given the same name they have in the slot @dates. When providing a formula, two options are possible: providing a single formula to be applied to all syndromes, or providing the same number of formulas (in a list) as the number of syndromes in the observed object, even if not of them will be used (see examples!) The variables that are standard in that slot for DAILY data (syndromicD) are: trend (for a monotonic trend), year, month, dow (day of week), sin, cos, Ar1 (auto-regressive for 1 days) to AR7. For WEEKLY data (syndromicW): trend, sin, cos, year, and 1 to 4 autoregressive variables. These elements can be combined into any formula. Since the @dates slot can be customized by the user, any variables in the dates data.frame can be called into the formula ##' @param frequency the frequency of repetition in the data, by default one year (365 for DAILY data (object provided belongs to the class syndromicD) and 52 for WEEK data (object provided belongs to the class syndromicW))

plot

whether plots comparing observed data and the result of the cleaning process should be displayed.

print.model

whether the result of model fitting should be printed on the console. This is recommended when the user is exploring which dependent variables to keep or drop.

Value

An object of the class syndromic (syndromicD or syndromicW) which contains all elements from the object provided in x, but in which the slot baseline has been filled with an outbreak-free baseline for each syndromic group. When the user chooses to restrict analyses to some syndromes, the remaining columns are kept as is (if the slot was not empty) or filled with NAs when previously empty.

References

Fernanda C. Dorea, Crawford W. Revie, Beverly J. McEwen, W. Bruce McNab, David Kelton, Javier Sanchez (2012). Retrospective time series analysis of veterinary laboratory data: Preparing a historical baseline for cluster detection in syndromic surveillance. Preventive Veterinary Medicine. DOI: 10.1016/j.prevetmed.2012.10.010.

Examples

## Examples for 'syndromicD'
data(lab.daily)
my.syndromicD <- raw_to_syndromicD (id=SubmissionID,
                                 syndromes.var=Syndrome,
                                 dates.var=DateofSubmission,
                                 date.format="%d/%m/%Y",
                                 remove.dow=c(6,0),
                                 add.to=c(2,1),
                                 data=lab.daily)
my.syndromicD <- clean_baseline(my.syndromicD,
                                formula=list(days~dow+month+year),
                                frequency=260)
my.syndromicD <- clean_baseline(my.syndromicD,
                                formula=list(days~dow+month+year),
                                frequency=260)
my.syndromicD <- clean_baseline(my.syndromicD, 
                                formula=list(days~dow+month+year),
                                frequency=260)
my.syndromicD <- clean_baseline(my.syndromicD,
                              syndromes="Musculoskeletal",
                              formula=list(days~dow+month+year),
                              frequency=260)
my.syndromicD <- clean_baseline(my.syndromicD,
                              syndromes=c("GIT","Musculoskeletal"),
                              formula=list(NA,y~dow+sin+cos+year+AR1+AR2+AR3+AR4+AR5+AR6+AR7,
                              days~dow+month,NA,NA),
                              frequency=260)
my.syndromicD <- clean_baseline(my.syndromicD,
                              syndromes=3,
                              formula=list(NA,y~dow+sin+cos+year+AR1+AR2+AR3+AR4+AR5+AR6+AR7,
                              days~dow+month,NA,NA),
                              frequency=260)
my.syndromicD <- clean_baseline(my.syndromicD,
                              syndromes=c(2,3),
                              formula=list(NA,y~dow+sin+cos+year+AR1+AR2+AR3+AR4+AR5+AR6+AR7,
                              days~dow+month,NA,NA)),
                              frequency=260)

my.syndromicD <- clean_baseline(my.syndromicD,
                              family="nbinom",
                              formula=list(days~dow+month),
                              frequency=260)
my.syndromicD <- clean_baseline(my.syndromicD,
                              syndromes="Musculoskeletal",
                              family="nbinom",
                              formula=list(y~dow+sin+cos+year+AR1+AR2+AR3+AR4+AR5+AR6+AR7),
                              frequency=260)
my.syndromicD <- clean_baseline(my.syndromicD,
                              syndromes=c("GIT","Musculoskeletal"),
                              family="nbinom",
                              formula=list(NA,y~dow+sin+cos+year+AR1+AR2+AR3+AR4+AR5+AR6+AR7,
                              days~dow+month,NA,NA),
                              frequency=260)
my.syndromicD <- clean_baseline(my.syndromicD,
                              syndromes=3,
                              family="nbinom",
                              formula=list(days~dow+month),
                              frequency=260)
my.syndromicD <- clean_baseline(my.syndromicD,
                              syndromes=c(2,3),
                              family="nbinom",
                              formula=list(days~dow+month),
                              frequency=260)

## Examples for 'syndromicW'
data(lab.daily)
my.syndromicW <- raw_to_syndromicW (id=SubmissionID,
                                 syndromes.var=Syndrome,
                                 dates.var=DateofSubmission,
                                 date.format="%d/%m/%Y",
                                 formula=list(NA,y~year,weeks~trend+sin+cos,NA,NA)
                                 data=lab.daily)
my.syndromicW <- clean_baseline(my.syndromicW,formula=list(NA,y~year,weeks~trend+sin+cos,NA,NA))
my.syndromicW <- clean_baseline(my.syndromicW, formula=list(week~sin+cos))
my.syndromicW <- clean_baseline(my.syndromicW,
                              syndromes="Musculoskeletal",
                              formula=list(week~sin+cos))
my.syndromicW <- clean_baseline(my.syndromicW,
                              syndromes=c("GIT","Musculoskeletal"),
                              formula=list(week~sin+cos))
my.syndromicW <- clean_baseline(my.syndromicW,
                              syndromes=3,
                              formula=list(week~sin+cos))
my.syndromicW <- clean_baseline(my.syndromicW,
                              syndromes=c(1,3),
                              formula=list(NA,y~year,weeks~trend+sin+cos,NA,NA))

my.syndromicW <- clean_baseline(my.syndromicW,
                              family="nbinom",
                              formula=list(week~sin+cos))
my.syndromicW <- clean_baseline(my.syndromicW,
                              syndromes="Musculoskeletal",family="nbinom",
                              formula=list(week~sin+cos))

nandadorea/vetsyn documentation built on April 30, 2022, 1:15 a.m.