# cpDist: Test for Change-Point Detection in Possibly Multivariate... In npcp: Some Nonparametric CUSUM Tests for Change-Point Detection in Possibly Multivariate Observations

## Description

Nonparametric test for change-point detection based on the (multivariate) empirical distribution function. The observations can be continuous univariate or multivariate, and serially independent or dependent (strongly mixing). Approximate p-values for the test statistics are obtained by means of a multiplier approach. The first reference treats the serially independent case while details about the serially dependent case can be found in second and third references.

## Usage

 ```1 2 3 4 5``` ```cpDist(x, statistic = c("cvmmax", "cvmmean", "ksmax", "ksmean"), method = c("nonseq", "seq"), b = NULL, gamma = 0, delta = 1e-4, weights = c("parzen", "bartlett"), m = 5, L.method=c("max","median","mean","min"), N = 1000, init.seq = NULL, include.replicates = FALSE) ```

## Arguments

 `x` a data matrix whose rows are continuous observations. `statistic` a string specifying the statistic whose value and p-value will be displayed; can be either `"cvmmax"` or `"cvmmean"` (the maximum or average of the `nrow(x)-1` intermediate Cramér-von Mises statistics), or `"ksmax"` or `"ksmean"` (the maximum or average of the `nrow(x)-1` intermediate Kolmogorov-Smirnov statistics); see Section 3 in the first reference. The four statistics and the corresponding p-values are computed at each execution. `method` a string specifying the simulation method for generating multiplier replicates of the test statistic; can be either `"nonseq"` (the 'check' approach in the first reference) or `"seq"` (the 'hat' approach in the first reference). The 'check' approach appears to lead to better behaved tests and is recommended. `b` strictly positive integer specifying the value of the bandwidth parameter determining the serial dependence when generating dependent multiplier sequences using the 'moving average approach'; see Section 5 of the second reference. The value 1 will create i.i.d. multiplier sequences suitable for serially independent observations. If set to `NULL`, `b` will be estimated from `x` using the function `bOptEmpProc()`; see the procedure described in Section 5 of the second reference. `gamma` parameter between 0 and 0.5 appearing in the definition of the weight function used in the detector function. `delta` parameter between 0 and 1 appearing in the definition of the weight function used in the detector function. `weights` a string specifying the kernel for creating the weights used in the generation of dependent multiplier sequences within the 'moving average approach'; see Section 5 of the second reference. `m` a strictly positive integer specifying the number of points of the uniform grid on (0,1)^d (where d is `ncol(x)`) involved in the estimation of the bandwidth parameter; see Section 5 of the third reference. The number of points of the grid is given by `m^ncol(x)` so that `m` needs to be decreased as d increases. `L.method` a string specifying how the parameter L involved in the estimation of the bandwidth parameter is computed; see Section 5 of the second reference. `N` number of multiplier replications. `init.seq` a sequence of independent standard normal variates of length `N * (nrow(x) + 2 * (b - 1))` used to generate dependent multiplier sequences. `include.replicates` a logical specifying whether the object of `class` `htest` returned by the function (see below) will include the multiplier replicates.

## Details

The approximate p-value is computed as

(0.5 + sum(S[i] >= S, i=1, .., N)) / (N+1),

where S and S[i] denote the test statistic and a multiplier replication, respectively. This ensures that the approximate p-value is a number strictly between 0 and 1, which is sometimes necessary for further treatments.

## Value

An object of `class` `htest` which is a list, some of the components of which are

 `statistic` value of the test statistic. `p.value` corresponding approximate p-value. `cvm` the values of the `nrow(x)-1` intermediate Cramér-von Mises change-point statistics. `ks` the values of the `nrow(x)-1` intermediate Kolmogorov-Smirnov change-point statistics. `all.statistics` the values of all four test statistics. `all.p.values` the corresponding p-values. `b` the value of parameter `b`.

## Note

Note that when the observations are continuous univariate and serially independent, independent realizations of the tests statistics under the null hypothesis of no change in the distribution can be obtained by simulation; see Section 4 in the first reference.

## References

M. Holmes, I. Kojadinovic and J-F. Quessy (2013), Nonparametric tests for change-point detection à la Gombay and Horváth, Journal of Multivariate Analysis 115, pages 16-32.

A. Bücher and I. Kojadinovic (2016), A dependent multiplier bootstrap for the sequential empirical copula process under strong mixing, Bernoulli 22:2, pages 927-968, http://arxiv.org/abs/1306.3930.

A. Bücher, J.-D. Fermanian and I. Kojadinovic (2019), Combining cumulative sum change-point detection tests for assessing the stationarity of univariate time series, Journal of Time Series Analysis 40, pages 124-150, http://arxiv.org/abs/1709.02673.

`cpCopula()` for a related test based on the empirical copula, `cpRho()` for a related test based on Spearman's rho, `cpTau()` for a related test based on Kendall's tau, `bOptEmpProc()` for the function used to estimate `b` from `x` if `b = NULL`, `seqCpDist` for the corresponding sequential test.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36``` ```## A univariate example n <- 100 k <- 50 ## the true change-point y <- rnorm(k) z <- rexp(n-k) x <- matrix(c(y,z)) cp <- cpDist(x, b = 1) cp ## All statistics cp\$all.statistics ## Corresponding p.values cp\$all.p.values ## Estimated change-point which(cp\$cvm == max(cp\$cvm)) which(cp\$ks == max(cp\$ks)) ## A very artificial trivariate example ## with a break in the first margin n <- 100 k <- 50 ## the true change-point y <- rnorm(k) z <- rnorm(n-k, mean = 2) x <- cbind(c(y,z),matrix(rnorm(2*n), n, 2)) cp <- cpDist(x, b = 1) cp ## All statistics cp\$all.statistics ## Corresponding p.values cp\$all.p.values ## Estimated change-point which(cp\$cvm == max(cp\$cvm)) which(cp\$ks == max(cp\$ks)) ```

### Example output

```	Test for change-point detection sensitive to changes in the
distribution function with 'method'="nonseq"

data:  x
cvmmax = 61.786, p-value = 0.0004995

cvmmax  cvmmean    ksmax   ksmean
61.78640 18.13580  1.27000  0.64777
cvmmax      cvmmean        ksmax       ksmean
0.0004995005 0.0004995005 0.0004995005 0.0004995005
cvm48
48
ks51
51

Test for change-point detection sensitive to changes in the
distribution function with 'method'="nonseq"

data:  x
cvmmax = 18.376, p-value = 0.0004995

cvmmax   cvmmean     ksmax    ksmean
18.376292  6.887732  1.700000  0.908000
cvmmax      cvmmean        ksmax       ksmean
0.0004995005 0.0004995005 0.0004995005 0.0004995005
cvm46
46
ks50
50
```

npcp documentation built on July 16, 2020, 5:07 p.m.