Description Details Author(s) References
An implementation of several different change point models (CPMs) for performing both parametric and nonparametric change detection on univariate data streams.
The CPM framework is an approach to sequential change detection (also known as Phase II process monitoring) which allows standard statistical hypothesis tests to be deployed sequentially. The main two general purpose functions in the package are detectChangePoint
and processStream
for detecting single and multiple change points respectively. The remainder of the functions allow for more precise control over the change detection procedure. To cite this R package in a research paper, please use citation('cpm')
to obtain the reference, and BibTeX entry.
Note: this package has a manual titled "Parametric and Nonparametric Sequential Change Detection in R: The cpm Package" available from www.gordonjross.co.uk, which contains a full description of all the functions and algorithms in the package, as well as detailed instructions on how to use it.
If you would like to cite this package, the citation information is "G. J. Ross - Parametric and Nonparametric Sequential Change Detection in R: The cpm Package, Journal of Statistical Software, 2015, 66(3), 1-20"
A Brief CPM Overview
Given a sequence X_1,...,X_n of random variables, the CPM works by evaluating a two-sample test statistic at every possible split point. Let D_{k,n} be the value of the test statistic when the sequence is split into the two samples \{X_1, X_2,..., X_k\} and \{X_{k+1}, X_{k+2} ,..., X_{n}\}, and define D_n to be the maximum of these values. D_n is then compared to some threshold, with a change being detected if the threshold is exceeded.
In the sequential context, the observations are processed one-by-one, with D_t being computed based on the first t observations, D_{t+1} being computed based on the first t+1 observations, and so on. The change detection time is defined as the first value of t where the threshold is exceeded. Supposing this occurs at time t=T, then the best estimate of the location of the change point is the value of k which maximised D_{k,T}. Writing \hat{τ} for this, we have that \hat{τ} ≤q T.
The thresholds are chosen so that there is a constant probability of a false positive occurring after each observation. This leads to control of the Average Run Length (ARL_0), defined as the expected number of observations received before a change is falsely detecting, assuming that no change has occurred.
The choice of test statistic in the CPM defines the class of changes which it is optimised towards detecting. This package implements CPMs using the following statistics. More details can be found in the references section:
Student: Student-t test statistic, as in [Hawkins et al, 2003]. Use to detect mean changes in a Gaussian sequence.
Bartlett: Bartlett test statistic, as in [Hawkins and Zamba, 2005]. Use to detect variance changes in a Gaussian sequence.
GLR
: Generalized Likelihood Ratio test statistic, as in [Hawkins and Zamba, 2005b]. Use to detect both mean and variance changes in a Gaussian sequence.
Exponential
: Generalized Likelihood Ratio test statistic for the Exponential distribution, as in [Ross, 2013]. Used to detect changes in the parameter of an Exponentially distributed sequence.
GLRAdjusted
and ExponentialAdjusted
: Identical to the GLR and Exponential statistics, except with the finite-sample correction discussed in [Ross, 2013] which can lead to more powerful change detection.
FET: Fishers Exact Test statistic, as in [Ross and Adams, 2012b]. Use to detect parameter changes in a Bernoulli sequence.
Mann-Whitney: Mann-Whitney test statistic, as in [Ross et al, 2011]. Use to detect location shifts in a stream with a (possibly unknown) non-Gaussian distribution.
Mood: Mood test statistic, as in [Ross et al, 2011]. Use to detect scale shifts in a stream with a (possibly unknown) non-Gaussian distribution.
Lepage: Lepage test statistics in [Ross et al, 2011]. Use to detect location and/ort shifts in a stream with a (possibly unknown) non-Gaussian distribution.
Kolmogorov-Smirnov: Kolmogorov-Smirnov test statistic, as in [Ross et al 2012]. Use to detect arbitrary changes in a stream with a (possibly unknown) non-Gaussian distribution.
Cramer-von-Mises: Cramer-von-Mises test statistic, as in [Ross et al 2012]. Use to detect arbitrary changes in a stream with a (possibly unknown) non-Gaussian distribution.
For a fuller overview of the package which includes a description of the CPM framework and examples of how to use the various functions, please consult the full package manual titled "Parametric and Nonparametric Sequential Change Detection in R: The cpm Package"
Gordon J. Ross gordon@gordonjross.co.uk
Hawkins, D. , Zamba, K. (2005) – A Change-Point Model for a Shift in Variance, Journal of Quality Technology, 37, 21-31
Hawkins, D. , Zamba, K. (2005b) – Statistical Process Control for Shifts in Mean or Variance Using a Changepoint Formulation, Technometrics, 47(2), 164-173
Hawkins, D., Qiu, P., Kang, C. (2003) – The Changepoint Model for Statistical Process Control, Journal of Quality Technology, 35, 355-366.
Ross, G. J., Tasoulis, D. K., Adams, N. M. (2011) – A Nonparametric Change-Point Model for Streaming Data, Technometrics, 53(4)
Ross, G. J., Adams, N. M. (2012) – Two Nonparametric Control Charts for Detecting Arbitary Distribution Changes, Journal of Quality Technology, 44:102-116
Ross, G. J., Adams, N. M. (2013) – Sequential Monitoring of a Proportion, Computational Statistics, 28(2)
Ross, G. J., (2014) – Sequential Change Detection in the Presence of Unknown Parameters, Statistics and Computing 24:1017-1030
Ross, G. J., (2015) – Parametric and Nonparametric Sequential Change Detection in R: The cpm Package, Journal of Statistical Software, forthcoming
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.