categoricalCUSUM: CUSUM detector for time-varying categorical time series

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Function to process sts object by binomial, beta-binomial or multinomial CUSUM. Logistic, multinomial logistic, proportional odds or Bradley-Terry regression models are used to specify in-control and out-of-control parameters.

Usage

1
2
categoricalCUSUM(stsObj,control = list(range=NULL,h=5,pi0=NULL,
                 pi1=NULL, dfun=NULL, ret=c("cases","value")),...)

Arguments

stsObj

Object of class sts containing the number of counts in each of the k categories of the response variable. Time varying number of counts n_t is found in slot populationFrac.

control

Control object containing several items

  • rangeVector of length t_{max} with indices of the observed slot to monitor.

  • hThreshold to use for the monitoring. Once the CUSUM statistics is larger or equal to h we have an alarm.

  • pi0(k-1) \times t_{max} in-control probability vector for all categories except the reference category.

  • mu1(k-1) \times t_{max} out-of-control probability vector for all categories except the reference category.

  • dfunThe probability mass function or density used to compute the likelihood ratios of the CUSUM. In a negative binomial CUSUM this is dnbinom, in a binomial CUSUM dbinom and in a multinomial CUSUM dmultinom. The function must be able to handle the arguments y, size, mu and log. As a consequence, one in the case of the beta-binomial distribution has to write a small wrapper function.

  • retReturn the necessary proportion to sound an alarm in the slot upperbound or just the value of the CUSUM statistic. Thus, ret is one of tha values in c("cases","value").

...

Additional arguments to send to dfun.

Details

The function allows the monitoring of categorical time series as described by regression models for binomial, beta-binomial or multinomial data. The later includes e.g. multinomial logistic regression models, proportional odds models or Bradley-Terry models for paired comparisons. See the Höhle (2010) reference for further details about the methodology.

Once an alarm is found the CUSUM scheme is resetted (to zero) and monitoring continues from there.

Value

An sts object with observed, alarm, etc. slots trimmed to the control$range indices.

Author(s)

M. Höhle

References

Höhle, M. (2010), Changepoint detection in categorical time series, Book chapter to appear in T. Kneib and G. Tutz (Eds.), Statistical Modelling and Regression Structures, Springer.

See Also

categoricalCUSUM

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
if (require("gamlss")) {
  ###########################################################################
  #Beta-binomial CUSUM for a small example containing the time-varying
  #number of positive test out of a time-varying number of total
  #test.
  #######################################

  #Load meat inspection data
  data("abattoir")

  #Use GAMLSS to fit beta-bin regression model
  phase1 <- 1:(2*52)
  phase2  <- (max(phase1)+1) : nrow(abattoir)

  #Fit beta-binomial model using GAMLSS
  abattoir.df <- as.data.frame(abattoir)
  colnames(abattoir.df) <- c("y","t","state","alarm","n")
  m.bbin <- gamlss( cbind(y,n-y) ~ 1 + t + 
		    + sin(2*pi/52*t) + cos(2*pi/52*t) +
		    + sin(4*pi/52*t) + cos(4*pi/52*t), sigma.formula=~1,
		    family=BB(sigma.link="log"),
		    data=abattoir.df[phase1,c("n","y","t")])

  #CUSUM parameters
  R <- 2 #detect a doubling of the odds for a test being positive
  h <- 4 #threshold of the cusum

  #Compute in-control and out of control mean
  pi0 <- predict(m.bbin,newdata=abattoir.df[phase2,c("n","y","t")],type="response")
  pi1 <- plogis(qlogis(pi0)+log(R))
  #Create matrix with in control and out of control proportions.
  #Categories are D=1 and D=0, where the latter is the reference category
  pi0m <- rbind(pi0, 1-pi0)
  pi1m <- rbind(pi1, 1-pi1)


  ######################################################################
  # Use the multinomial surveillance function. To this end it is necessary
  # to create a new abattoir object containing counts and proportion for
  # each of the k=2 categories. For binomial data this appears a bit
  # redundant, but generalizes easier to k>2 categories.
  ######################################################################

  abattoir2 <- new("sts",epoch=1:nrow(abattoir), start=c(2006,1),freq=52,
    observed=cbind(abattoir@observed,abattoir@populationFrac -abattoir@observed),
    populationFrac=cbind(abattoir@populationFrac,abattoir@populationFrac),
    state=matrix(0,nrow=nrow(abattoir),ncol=2),
    multinomialTS=TRUE)

  ######################################################################
  #Function to use as dfun in the categoricalCUSUM
  #(just a wrapper to the dBB function). Note that from v 3.0-1 the
  #first argument of dBB changed its name from "y" to "x"!
  ######################################################################
  mydBB.cusum <- function(y, mu, sigma, size, log = FALSE) {
    return(dBB(y[1,], mu = mu[1,], sigma = sigma, bd = size, log = log))
  }


  #Create control object for multinom cusum and use the categoricalCUSUM
  #method
  control <- list(range=phase2,h=h,pi0=pi0m, pi1=pi1m, ret="cases",
		   dfun=mydBB.cusum)
  surv <- categoricalCUSUM(abattoir2, control=control,
			   sigma=exp(m.bbin$sigma.coef))

  #Show results
  plot(surv[,1],dx.upperbound=0)
  lines(pi0,col="green")
  lines(pi1,col="red")

  #Index of the alarm
  which.max(alarms(surv[,1]))
}

#ToDo: Compute run length using LRCUSUM.runlength

jimhester/surveillance documentation built on May 19, 2019, 10:33 a.m.