onlineCPD: Bayesian Online Changepoint Detection

Description Usage Arguments Value Examples

View source: R/onlineCPD.R

Description

The main algorithm called "Bayesian Online Changepoint Detection". Input is data in form of a matrix and, optionally an existing ocp object to build on. Output is the list of changepoints and other values calculated during running the model.

Usage

1
2
3
4
5
6
7
onlineCPD(datapts, oCPD = NULL, missPts = "none",
  hazard_func = function(x, lambda) {     const_hazard(x, lambda = 100)
  }, probModel = list("g"), init_params = list(list(m = 0, k = 0.01, a
  = 0.01, b = 1e-04)), multivariate = FALSE, cpthreshold = 0.5,
  truncRlim = .Machine$double.xmin, minRlength = 1,
  maxRlength = 10^4, minsep = 1, maxsep = 10^4, timing = FALSE,
  getR = FALSE, optionalOutputs = FALSE, printupdates = FALSE)

Arguments

datapts

the input data in form of a matrix, where the rows correspond to each data point, and the columns correspond to each dimension.

oCPD

ocp object computed in a previous run of an algorithm. it can be built upon with the input data points, as long as the settings for both are the same.

missPts

This setting indicates how to deal with missing points (e.g. NA). The options are: "mean", "prev", "none", and a numeric value. If the data is multivariate. The numeric replacement value could either be a single value which would apply to all dimensions, or a vector of the same length as the number of dimensions of the data.

hazard_func

This setting allows choosing a hazard function, and also setting the constants within that function. For example, the default hazard function is: function(x, lambda)const_hazard(x, lambda=100) and the lambda can be set as appropriate.

probModel

This parameter is a function to be used to calculate the predictive probabilities and update the parameters of the model. The default setting uses a gaussian underlying distribution: "gaussian"

init_params

The parameters used to initialize the probability model. The default settings correspond to the input default gaussian model.

multivariate

This setting indicates if the incoming data is multivariate or univariate.

cpthreshold

Probability threshold for the method of extracting a list of all changepoints that have a run length probability higher than a specified value. The default is set to 0.5.

truncRlim

The probability threshold to begin truncating the R vector. The R vector is a vector of run-length probabilities. To prevent truncation, set this to 0. The defaults setting is 10^(-4) as suggested by the paper.

minRlength

The minimum size the run length probabilities vector must be before beginning to check for the truncation threshold.

maxRlength

The maximum size the R vector is allowed to be, before enforcing truncation to happen.

minsep

This setting constrains the possible changepoint locations considered in determining the optimal set of changepoints. It prevents considered changepoints that are closer together than the value of minsep. The default is 3.

maxsep

This setting constrains the possible changepoint locations considered in determining the optimal set of changepoints. It prevents considered changepoints that are closer farther apart than the value of maxsep. The default is 100.

timing

To print out times during the algorithm running, to track its progress, set this setting to true.

getR

To output the full R matrix, set this setting to TRUE. Outputting this matrix causes a major slow down in efficiency.

optionalOutputs

Output additional values calculated during running the algorithm, including a matrix containing all the input data, the predictive probability vectors at each step of the algorithm, and the vector of means at each step of the algorithm.

printupdates

This setting prints out updates on the progress of the algorithm if set to TRUE.

Value

An ocp object containing the main output: a list of changepoints from each time point, and many additional outputs: the number of time points, the initial settings of the algorithm, the current model parameters, the means from each time point, the most recently processed point, the most recently calculated vector of run length probabilities, and a vector of probabilities of changepoints at each time point.

Examples

1
2
3
simdatapts<- c(rnorm(n = 50), rnorm(n=50, 100))
ocpd1<- onlineCPD(simdatapts)
ocpd1$changepoint_lists # view the changepoint lists

Example output

$colmaxes
$colmaxes[[1]]
[1]  1 51


$threshcps
$threshcps[[1]]
[1]  1 51


$maxCPs
$maxCPs[[1]]
[1]  1 51

ocp documentation built on May 2, 2019, 3:46 a.m.