not: Narrowest-Over-Threshold Change-Point Detection

View source: R/not.R

notR Documentation

Narrowest-Over-Threshold Change-Point Detection

Description

Identifies potential locations of the change-points in the data following 'deterministic signal + noise' model (see details below) in a number of different scenarios. The object returned by this routine can be further passed to the features function, which finds the final estimate of the change-points based on a chosen stopping criterion. It can be also passed to plot, predict and residuals routines.

Usage

not(x, ...)

## Default S3 method:
not(x, M = 10000, method = c("not", "max"),
  contrast = c("pcwsConstMean", "pcwsConstMeanHT", "pcwsLinContMean",
  "pcwsLinMean", "pcwsQuadMean", "pcwsConstMeanVar"),
  rand.intervals = TRUE, parallel = FALSE, augmented = FALSE,
  intervals, ...)

Arguments

x

A numeric vector with data points.

...

Not in use.

M

A number of intervals drawn in the procedure.

method

Choice of "not" (recommended) and "max". If method="not", the Narrowest-Over-Threshold intervals are used in the algorithm. If method="max", the intervals corresponding to the largest contrast function are used. For an explanation, see the references.

contrast

A type of the contrast function used in the NOT algorithm. Choice of "pcwsConstMean", "pcwsConstMeanHT", "pcwsLinContMean", "pcwsLinMean", "pcwsQuadMean", "pcwsConstMeanVar". For the explanation, see details below.

rand.intervals

A logical variable. If rand.intervals=TRUE intervals used in the procedure are drawn uniformly using the random.intervals routine. If rand.intervals=FALSE, the intervals need to be passed using the intervals argument.

parallel

A logical variable. If TRUE some of computations are run in parallel using OpenMP framework. Currently this option is not supported on Windows.

augmented

A logical variable. if TRUE, the entire data are considered when the NOT segmentation tree is constructed (see the solution path algorithm in the references).

intervals

A 2-column matrix with the intervals considered in the algorithm, with start- and end- points of the intervals in, respectively, the first and the second column. The intervals are used only if rand.intervals=FALSE.

Details

The data points provided in x are assumed to follow

Y_t= f_t + sigma_t varepsilon_t,

for t=1,...,n, where n is the number of observations in x, the signal f_t and the standard deviation sigma_{t} are non-stochastic with structural breaks at unknown locations in time t. Currently, thefollowing scenarios for f_t and sigma_t are implemented:

  • Piecewise-constant signal with a Gaussian noise and constant standard deviation.

    Use contrast="pcwsConstMean" here.

  • Piecewise-constant mean with a heavy-tailed noise and constant standard deviation.

    Use contrast="pcwsConstMeanHT" here.

  • Piecewise-linear continuous signal with Gaussian noise and constant standard deviation.

    Use contrast="pcwsLinContMean" here.

  • Piecewise-linear signal with Gaussian noise and constant standard deviation.

    Use contrast="pcwsLinMean" here.

  • Piecewise-quadratic signal with Gaussian noise and constant standard deviation.

    Use contrast="pcwsQuadMean" here.

  • Piecewise-constant signal and piecewise-constant standard deviation of the Gaussian noise.

    Use contrast="pcwsConstMeanVar" here.

Value

An object of class "not", which contains the following fields:

x

The input vector.

n

The length of x.

contrast

A scenario for the change-points.

contrasts

A 5-column matrix with the values of the contrast function, where 's' and 'e' denote start- end points of the intervals in which change-points candidates 'arg.max' have been found; 'length' shows the length of the intervals drawn, column 'max.contrast' contains corresponding value of the contrast statistic.

solution.path

A list with the solution path of the NOT algorithm (see the references) containing three fields of the same length: cpt - a list with consecutive solutions, i.e. s the sets of change-point candidates, th - a vector of thresholds corresponding to the solutions, n.cpt - a vector with the number of change-points for each solution.

References

R. Baranowski, Y. Chen, and P. Fryzlewicz (2019). Narrowest-Over-Threshold Change-Point Detection. (http://stats.lse.ac.uk/fryzlewicz/not/not.pdf)

Examples

# **** Piecewisce-constant mean with Gaussian noise.
# *** signal
pcws.const.sig <- c(rep(0, 100), rep(1,100))
# *** data vector
x <- pcws.const.sig + rnorm(100)
# *** identify potential locations of the change-points
w <- not(x, contrast = "pcwsConstMean") 
# *** some examples of how the w object can be used
plot(w)
plot(residuals(w))
plot(predict(w))
# *** this is how to extract the change-points
fo <- features(w)
fo$cpt

# **** Piecewisce-constant mean with a heavy-tailed noise.
# *** data vector, signal the same as in the previous example, but heavy tails
x <- pcws.const.sig + rt(100, 3) 
# *** identify potential locations of the change-points, 
# using a contrast taylored to heavy-tailed data
w <- not(x, contrast = "pcwsConstMeanHT") 
plot(w)

# **** Piecewisce-constant mean and piecewise-constant variance
# *** signal's standard deviation
pcws.const.sd <- c(rep(2, 50), rep(1,150))
# *** data vector with pcws-const mean and variance
x <- pcws.const.sig + pcws.const.sd * rnorm(100)
# *** identify potential locations of the change-points in this model
w <- not(x, contrast = "pcwsConstMeanVar") 
# *** extracting locations of the change-points
fo <- features(w)
fo$cpt

# **** Piecewisce-linear coninuous mean
# *** signal with a change in slope
pcws.lin.cont.sig <- cumsum(c(rep(-1/50, 100), rep(1/50,100)))
# *** data vector 
x <- pcws.lin.cont.sig +  rnorm(100)
# *** identify potential locations of the change-points in the slope coefficient
w <- not(x, contrast = "pcwsLinContMean") 
# *** ploting the results
plot(w)
# *** location(s) of the change-points
fo <- features(w)
fo$cpt

# **** Piecewisce-linear mean with jumps
# *** signal with a change in slope and jumpe
pcws.lin.sig <- pcws.lin.cont.sig + pcws.const.sig
# *** data vector 
x <- pcws.lin.sig +  rnorm(100)
# *** identify potential locations of the change-points in the slope coefficient and the intercept
w <- not(x, contrast = "pcwsLinMean") 
# *** ploting the results
plot(w)
# *** location(s) of the change-points
fo <- features(w)
fo$cpt

# **** Piecewisce-quadratic mean with jumps
# *** Piecewise-quadratic signal
pcws.quad.sig <- 2*c((1:50)^2 /1000, rep(2, 100), 1:50 / 50 )
# *** data vector 
x <- pcws.quad.sig +  rnorm(100)
# *** identify potential locations of the change-points in the slope coefficient and the intercept
w <- not(x, contrast = "pcwsQuadMean") 
# *** ploting the results
plot(w)
# *** location(s) of the change-points
fo <- features(w)
fo$cpt

rbaranowski/not documentation built on March 22, 2022, 1:18 p.m.