Offline Bayesian Changepoint Detection

Description

An algorithm for detecting multiple changepoints in uni- or multivariate time series. The algorithm works on-line; ie the model is calculated and updated with each data observation. Though the algorithm performs as if data was supplied on-line, this version of the algorithm takes the whole series at once, ie it performs off-line. See onlineCPD for a version that runs iteratively, one data point at a time. The algorithm implements the Bayesian methods given in Adams and Mackay (2007) and is based on Matlab code released with the paper. The model has been extended to work on multivariate data.

Usage

1
2
offlineCPD(data, time = NULL, hazard_func = const_hazard,
 m = 0, k = 0.01, a = 0.01, b = 1e-04)

Arguments

data

a vector (for univariate) or matrix (for multivariate) composed of time series data. For multivariate, each column is a different time series. Column names will be extracted for plotting, so name the columns accordingly. Note that you must provide the whole series; if you want to provide one data point at a time, use onlineCPD

time

an optional vector of times in POSIXct format, where each time corresponds to a value in the data matrix or vector, used for pretty-printing in the plot.oCPD function.

hazard_func

hazard function used in the model. Defaults to a constant hazard, suitable for exponential family models.

m

initial value of mu, the mean of the data. Defaults to 0. As the mean is updated with every data point, this value does not need to be changed, but is safe to be experimented with.

k

initial value of kappa, basically a counter. Defaults to 0.01. May be useful to increase this to 1 if the data is large.

a

initial value of alpha, basically a counter. Defaults to 0.01. May be useful to increase this to 1 if the data is large.

b

initial value of beta, the variance of the data. Defaults to 1e-4. As the variance is updated with every data point, this value does not need to be changed, but is safe to be experimented with.

Details

The primary result is a list of detected changepoints. Note that the list of changepoints must be interpreted; for example, sometimes the algorithm is unsure of the exact location of a change and prints several possible changepoints. Helper function findCP, called from plot.oCPD, will help reduce some of these changes.

The algorithm works by estimating the posterior distribution over the run-length, or the number of data points since the last changepoint. At each time, the run-length can either increase by one or reduce to zero.

The functions summary.oCPD, plot.oCPD, str.oCPD and print.oCPD are used to obtain summaries of the results.

See plot.oCPD for advice on how to interpret the results after plotting.

Value

An object of class "oCPD", which is a list containing the following:

R

n by n matrix of run-length probabilities. The value at R[i,j] is the probability that at data point j, the current run length is i.

data

same as the input parameter, included for plotting.

time

same as the input parameter, included for plotting.

alpha

the vector of values of alpha after the final data point.

beta

the vector (or matrix) of values of beta (the variance) after the final data point.

kappa

the vector of values of kappa after the final data point.

mu

the vector (or matrix) of values of mu (the mean) after the final data point

max

vector of values; max[i] is the runlength with the highest probability. Used to plot the red diagonal line in plot.oCPD

changes

locations of detected changepoints. When the algorithm can not detect the exact location of a change, multiple possible values are reported.

Author(s)

Zachary Zanussi

References

Adams, R. P. and Mackay, D. J. C. (2007), Bayesian Online Changepoint Detection

####OUR PAPER, WHEN IT EXISTS

See Also

plot.oCPD, summary.oCPD, print.oCPD for summaries of the results.

Examples

1
2
3
4
5
6
7
8
9
##### Univariate Data #####
set.seed(6)
x <- c(rnorm(50,mean=0.3,sd=0.15),rnorm(40,mean=0.7,sd=0.1),rnorm(60,mean=0.5,sd=0.15))
plot(offlineCPD(x))

##### Real Multivariate Data #####
data(WalBelSentiment)
data(WalBelTimes)
plot(offlineCPD(WalBelSentiment[1400:1600,],WalBelTimes[1400:1600]))