BinarySegmentation: BinarySegmentation

Description Usage Arguments Value References Examples

View source: R/binary_segmentation.R

Description

Applies the binary segmentation algorithmn by recursively calling FindBestSplit in order to build a binary tree. The tree can then be pruned using PruneTreeGamma in order to obtain a changepoint estimate. Typically this function is not used directly but the interface hdcd.

Usage

1
2
3
4
BinarySegmentation(x, delta, lambda, method = c("nodewise_regression",
  "summed_regression", "ratio_regression"), penalize_diagonal = F,
  optimizer = c("line_search", "section_search"), control = NULL,
  standardize = T, threshold = 1e-07, verbose = FALSE, FUN = NULL, ...)

Arguments

x

A n times p matrix or data frame.

delta

Numeric value between 0 and 0.5. This tuning parameter determines the minimal segment size proportional to the size of the dataset and hence an upper bound for the number of changepoints (roughly 1/δ).

lambda

Positive numeric value. This is the regularization parameter in the single Lasso fits. This value is ignored if FUN is not NULL.

method

Which estimator should be used? Possible choices are

  • nodewise_regression: Nodewise regression is based on a single node that needs to be specified with an additional parameter node pointing to the column index of the node of interest. Uses glmnet internally. See Kovács (2016) for details.

  • summed_regression: Summed nodewise regression sums up the residual variances of nodewise regression over all nodes. Uses glasso internally. See Kovács (2016) for details.

  • ratio_regression: Likelihood ratio based regression sums the pseudo-profile-likelihood over all nodes. Uses glasso internally. See Kovács (2016) for details.

  • glasso: The graphical Lasso uses the approach of Friedman et al (2007). In contrast to the other approaches the exact likelihood the whole graphical model is computed and used as loss.

This value is ignored if FUN is not NULL.

penalize_diagonal

Boolean, should the diagonal elements of the precision matrix be penalized by λ? This value is ignored if FUN is not NULL.

optimizer

Which search technique should be used for performing individual splits in the binary segmentation alogrithm? Possible choices are

  • line_search: Exhaustive linear search. All possivle split candidates are evaluated and the index with maximal loss reduction is returned.

  • section_search: Iteratively cuts the search space according by a flexible ratio as determined by parameter stepsize in control parameter list and approximately finds an index at a local maximum. See Haubner (2018) for details.

control

A list with parameters that is accessed by the selected optimizer:

  • stepsize: Numeric value between 0 and 0.5. Used by section search.

  • min_points: Integer value larger than 3. Used by section search.

standardize

Boolean. If TRUE the penalty parameter λ will be adjusted for every dimension in the single Lasso fits according to the standard deviation in the data.

threshold

The threshold for halting the iteration in glasso or glmnet. In the former it controls the absolute change of single parameters in the latter it controls the total objective value. This value is ignored if FUN is not NULL.

verbose

Boolean. If TRUE additional information will be printed.

FUN

A loss function with formal arguments, x, n_obs and standardize which returns a scalar representing the loss for the segment the function is applied to.

...

Supply additional arguments for a specific method (e.g. node for nodewise_regression) or own loss function FUN

Value

An object of class bs_tree and Node (as defined in Node).

References

Friedman, J., Hastie, T. & Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008).

Haubner, L. Optimistic binary segmentation: A scalable approach to changepoint detection in high-dimensional graphical models. (Seminar for Statistics, ETH Zurich, 2018).

Kovács, S. Changepoint detection for high-dimensional covariance matrix estimation. (Seminar for Statistics, ETH Zurich, 2016).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Use summed regression loss function and ChainNetwork
dat <- SimulateFromModel(CreateModel(n_segments = 2,n = 50,p = 30, ChainNetwork))
res <- BinarySegmentation(dat, delta = 0.1, lambda = 0.01, method = "summed_regression")
print(res)

# Define your own loss function and pass it to the BinarySegmentation algorithm

InitNaiveSquaredLoss <- function(x){

  n_obs <- NROW(x)

  function(x, start, end){

    stopifnot(end >= start && end <= n_obs && start >= 1)

    sum((x - mean(x))^2) / n_obs

  }
}

p <- 5
n <- 20
mean_vecs <- list(rep(0, p), c(rep(1, p-2), rep(5, 2)), rep(-0.5, p))

model <- CreateModel(3, n, p, DiagMatrix, equispaced = TRUE, mean_vecs = mean_vecs)

x <- SimulateFromModel(model)

res <- BinarySegmentation(x, delta = 0.1, lambda = 0.01, FUN = InitNaiveSquaredLoss,
optimizer = "section_search")
print(res)

res <- BinarySegmentation(x, delta = 0.1, lambda = 0.01, FUN = InitNaiveSquaredLoss,
optimizer = "line_search")
print(res)

lorenzha/hdcd documentation built on Sept. 2, 2018, 8:20 p.m.