hdcd: Change Point Detection Algorithm

Description Usage Arguments Value

Description

Find Changepoints in a design matrix

Usage

1
2
3
4
hdcd(x, method = NULL, optimizer = NULL, delta = NULL,
  lambda = NULL, loss_function = NULL, gain_function = NULL,
  best_split_function = NULL, cross_validation_function = NULL,
  model_selection_function = NULL, control = hdcd_control())

Arguments

x

A design matrix

method

The method used to find splits. One of glasso, random_forest, elastic_net or custom.

optimizer

The optimizer used to find change points. Supply line_search for most reliable results (except for the rf method), section_search for a faster (OBS) approach or two_step_search for an extremely fast method, best for the rf method.

delta

The minimal relative segment length.

lambda

A regularisation parameter for methods that require one.

loss_function

A function with formal arguments x and possibly lambda that returns some kind of training loss.

gain_function

A function with formal arguments x, start, end and lambda that returns a closure with argument split_point, that returns the gain after splitting the segment (start, end] at split_point given data x and tuning parameter lambda.

cross_validation_function

A function with formal arguments x, start, end, lambda and folds that returns a list with arguments cv_loss and lambda_opt.

model_selection_function

A function with formal arguments x, start, split_point and end that returns a list with arguments statistic, a value that measures the significance of the split at split_point, and is_siginificant, a boolean indicating whether the value returned for statistic is significant.

control

An object of class hdcd_control as generated by hdcd_control.

get_best_split

A function with formal arguments x, start, end and split_candidates that returns a list with arguments gain and best_split, where gain is an array of length nrow(x) with possibly evaluations of a gains curve or similar saved and best_split it the best element of split_candidates to split the interval (start, end].

lambda

A tuning parameter used in the evaluation of the gain curve. If a cross_validation_function is supplied it will used lambda as an initial guess and in the following the optimal value from cross-validation will be used frot the evaluation of the gain curve.

Value

A tree with the splitting structure of the binary segmentation algorithm. If some form of inner cross validation or model selection was used, the estimated change points can be extracted via get_change_points_from_tree.

A function that estimates change points in x. Currently available methods are random_forest, glasso, elastic_net and custom. For all but the latter loglikelihood based loss functions are used to estimate the best split in a binary segmentation fashion. For the method custom an individual loss, gain or best_split_function can be supplied to find change points. The best split in each step of BS is found using optimizer, which can be set to be one of line_search, section_search or two_step_search. Line Search finds the maximum of the gains function by evaluating it at every possible split. Section Search (also Optimistic Binary Search) makes use of the piecewise convex structure of the gains curve to find one of the local maxima with approximately log(n) evaluations of the gain function. Two Step Search uses the individual loglikelihoods of predictions after a fit at a first guess to obtain a second guess which gets refined to an final best split point. For methods other than random_forest we encourage the use of Line Search whenever the computational cost allows this and Section Search else. We recommend the Two Step Search for the random_forest method.


MalteLond/rfcd documentation built on June 19, 2019, 2:52 p.m.