# offlineCPD: Offline Bayesian Changepoint Detection In onlineCPD: Detect Changepoints in Multivariate Time Series

## Description

An algorithm for detecting multiple changepoints in uni- or multivariate time series. The algorithm works on-line; ie the model is calculated and updated with each data observation. Though the algorithm performs as if data was supplied on-line, this version of the algorithm takes the whole series at once, ie it performs off-line. See `onlineCPD` for a version that runs iteratively, one data point at a time. The algorithm implements the Bayesian methods given in Adams and Mackay (2007) and is based on Matlab code released with the paper. The model has been extended to work on multivariate data.

## Usage

 ```1 2``` ```offlineCPD(data, time = NULL, hazard_func = const_hazard, m = 0, k = 0.01, a = 0.01, b = 1e-04) ```

## Arguments

 `data` a vector (for univariate) or matrix (for multivariate) composed of time series data. For multivariate, each column is a different time series. Column names will be extracted for plotting, so name the columns accordingly. Note that you must provide the whole series; if you want to provide one data point at a time, use `onlineCPD` `time` an optional vector of times in POSIXct format, where each time corresponds to a value in the `data` matrix or vector, used for pretty-printing in the `plot.oCPD` function. `hazard_func` hazard function used in the model. Defaults to a constant hazard, suitable for exponential family models. `m` initial value of `mu`, the mean of the data. Defaults to 0. As the mean is updated with every data point, this value does not need to be changed, but is safe to be experimented with. `k` initial value of `kappa`, basically a counter. Defaults to 0.01. May be useful to increase this to 1 if the data is large. `a` initial value of `alpha`, basically a counter. Defaults to 0.01. May be useful to increase this to 1 if the data is large. `b` initial value of `beta`, the variance of the data. Defaults to 1e-4. As the variance is updated with every data point, this value does not need to be changed, but is safe to be experimented with.

## Details

The primary result is a list of detected changepoints. Note that the list of changepoints must be interpreted; for example, sometimes the algorithm is unsure of the exact location of a change and prints several possible changepoints. Helper function `findCP`, called from `plot.oCPD`, will help reduce some of these changes.

The algorithm works by estimating the posterior distribution over the run-length, or the number of data points since the last changepoint. At each time, the run-length can either increase by one or reduce to zero.

The functions `summary.oCPD`, `plot.oCPD`, `str.oCPD` and `print.oCPD` are used to obtain summaries of the results.

See `plot.oCPD` for advice on how to interpret the results after plotting.

## Value

An object of class "oCPD", which is a list containing the following:

 `R` n by n matrix of run-length probabilities. The value at `R[i,j]` is the probability that at data point `j`, the current run length is `i`. `data` same as the input parameter, included for plotting. `time` same as the input parameter, included for plotting. `alpha` the vector of values of alpha after the final data point. `beta` the vector (or matrix) of values of beta (the variance) after the final data point. `kappa` the vector of values of kappa after the final data point. `mu` the vector (or matrix) of values of mu (the mean) after the final data point `max` vector of values; `max[i]` is the runlength with the highest probability. Used to plot the red diagonal line in plot.oCPD `changes` locations of detected changepoints. When the algorithm can not detect the exact location of a change, multiple possible values are reported.

Zachary Zanussi

## References

Adams, R. P. and Mackay, D. J. C. (2007), Bayesian Online Changepoint Detection

####OUR PAPER, WHEN IT EXISTS

 ```1 2 3 4 5 6 7 8 9``` ```##### Univariate Data ##### set.seed(6) x <- c(rnorm(50,mean=0.3,sd=0.15),rnorm(40,mean=0.7,sd=0.1),rnorm(60,mean=0.5,sd=0.15)) plot(offlineCPD(x)) ##### Real Multivariate Data ##### data(WalBelSentiment) data(WalBelTimes) plot(offlineCPD(WalBelSentiment[1400:1600,],WalBelTimes[1400:1600])) ```