cpt.mean: Identifying Changes in Mean

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Calculates the optimal positioning and (potentially) number of changepoints for data using the user specified method.

Usage

1
2
cpt.mean(data,penalty="MBIC",pen.value=0,method="AMOC",Q=5,test.stat="Normal",class=TRUE,
param.estimates=TRUE,minseglen=1)

Arguments

data

A vector, ts object or matrix containing the data within which you wish to find a changepoint. If data is a matrix, each row is considered a separate dataset.

penalty

Choice of "None", "SIC", "BIC", "MBIC", AIC", "Hannan-Quinn", "Asymptotic", "Manual" and "CROPS" penalties. If Manual is specified, the manual penalty is contained in the pen.value parameter. If Asymptotic is specified, the theoretical type I error is contained in the pen.value parameter. If CROPS is specified, the penalty range is contained in the pen.value parameter; note this is a vector of length 2 which contains the minimum and maximum penalty value. Note CROPS can only be used if the method is "PELT". The predefined penalties listed DO count the changepoint as a parameter, postfix a 0 e.g."SIC0" to NOT count the changepoint as a parameter.

pen.value

The theoretical type I error e.g.0.05 when using the Asymptotic penalty. A vector of length 2 (min,max) if using the CROPS penalty. The value of the penalty when using the Manual penalty option - this can be a numeric value or text giving the formula to use. Available variables are, n=length of original data, null=null likelihood, alt=alternative likelihood, tau=proposed changepoint, diffparam=difference in number of alternatve and null parameters.

method

Choice of "AMOC", "PELT", "SegNeigh" or "BinSeg".

Q

The maximum number of changepoints to search for using the "BinSeg" method. The maximum number of segments (number of changepoints + 1) to search for using the "SegNeigh" method.

test.stat

The assumed test statistic / distribution of the data. Currently only "Normal" and "CUSUM" supported.

class

Logical. If TRUE then an object of class cpt is returned.

param.estimates

Logical. If TRUE and class=TRUE then parameter estimates are returned. If FALSE or class=FALSE no parameter estimates are returned.

minseglen

Positive integer giving the minimum segment length (no. of observations between changes), default is the minimum allowed by theory.

Details

This function is used to find changes in mean for data using the test statistic specfified in the test.stat parameter. The changes are found using the method supplied which can be single changepoint (AMOC) or multiple changepoints using exact (PELT or SegNeigh) or approximate (BinSeg) methods. A changepoint is denoted as the first observation of the new segment / regime.

Value

If class=TRUE then an object of S4 class "cpt" is returned. The slot cpts contains the changepoints that are returned. For class=FALSE the structure is as follows.

If data is a vector (single dataset) then a vector/list is returned depending on the value of method. If data is a matrix (multiple datasets) then a list is returned where each element in the list is either a vector or list depending on the value of method.

If method is AMOC then a vector (one dataset) or matrix (multiple datasets) is returned, the columns are:

cpt

The most probable location of a changepoint if a change was identified or NA if no changepoint.

p value

The p-value of the identified changepoint.

If method is PELT then a vector is returned containing the changepoint locations for the penalty supplied. This always ends with n. If the penalty is CROPS then a list is returned with elements:

cpt.out

A data frame containing the value of the penalty value where the number of segmentations changes, the number of segmentations and the value of the cost at that penalty value.

changepoints

The optimal changepoint for the different penalty values starting with the lowest penalty value

If method is SegNeigh then a list is returned with elements:

cps

Matrix containing the changepoint positions for 1,...,Q changepoints.

op.cpts

The optimal changepoint locations for the penalty supplied.

pen

Penalty used to find the optimal number of changepoints.

like

Value of the -2*log(likelihood ratio) + penalty for the optimal number of changepoints selected.

If method is BinSeg then a list is returned with elements:

cps

2xQ Matrix containing the changepoint positions on the first row and the test statistic on the second row.

op.cpts

The optimal changepoint locations for the penalty supplied.

pen

Penalty used to find the optimal number of changepoints.

Author(s)

Rebecca Killick

References

Change in Normal mean: Hinkley, D. V. (1970) Inference About the Change-Point in a Sequence of Random Variables, Biometrika 57, 1–17

CUSUM Test: M. Csorgo, L. Horvath (1997) Limit Theorems in Change-Point Analysis, Wiley

PELT Algorithm: Killick R, Fearnhead P, Eckley IA (2012) Optimal detection of changepoints with a linear computational cost, JASA 107(500), 1590–1598

CROPS: Haynes K, Eckley IA, Fearnhead P (2014) Efficient penalty search for multiple changepoint problems (in submission), arXiv:1412.3617

Binary Segmentation: Scott, A. J. and Knott, M. (1974) A Cluster Analysis Method for Grouping Means in the Analysis of Variance, Biometrics 30(3), 507–512

Segment Neighbourhoods: Auger, I. E. And Lawrence, C. E. (1989) Algorithms for the Optimal Identification of Segment Neighborhoods, Bulletin of Mathematical Biology 51(1), 39–54

MBIC: Zhang, N. R. and Siegmund, D. O. (2007) A Modified Bayes Information Criterion with Applications to the Analysis of Comparative Genomic Hybridization Data. Biometrics 63, 22-32.

See Also

cpt.var,cpt.meanvar,plot-methods,cpt

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Example of a change in mean at 100 in simulated normal data
set.seed(1)
x=c(rnorm(100,0,1),rnorm(100,10,1))
cpt.mean(x,penalty="SIC",method="AMOC",class=FALSE) # returns 100 to show that the null hypothesis
#was rejected and the change in mean is at 100
ans=cpt.mean(x,penalty="Asymptotic",pen.value=0.01,method="AMOC") 
cpts(ans)# returns 100 to show that the null hypothesis was rejected, the change in mean is at 100
#and we are 99% confident of this result
cpt.mean(x,penalty="Manual",pen.value=0.8,method="AMOC",test.stat="CUSUM") 
# returns 101 as the changepoint location 


# Example of multiple changes in mean at 50,100,150 in simulated normal data
set.seed(1)
x=c(rnorm(50,0,1),rnorm(50,5,1),rnorm(50,10,1),rnorm(50,3,1))
cpt.mean(x,penalty="Manual",pen.value="2*log(n)",method="BinSeg",Q=5,class=FALSE) 
# returns optimal number of changepoints is 3, locations are 50,100,150.

# Example of using the CROPS penalty in data set above
set.seed(1)
x=c(rnorm(50,0,1),rnorm(50,5,1),rnorm(50,10,1),rnorm(50,3,1))
out=cpt.mean(x, pen.value = c(4,1500),penalty = "CROPS",method = "PELT") 
cpts.full(out)  # returns 7 segmentations for penalty values between 4 and 1500.  
# We find segmentations with 7, 5, 4, 3, 2, 1 and 0 changepoints. 
# Note that the empty final row indicates no changepoints.
pen.value.full(out) # gives associated penalty transition points
# CROPS does not give an optimal set of changepoints thus we may wish to explore further
plot(out,diagnostic=TRUE) 
# looks like the segmentation with 3 changepoints, 50,100,150 is the most appropriate
plot(out,ncpts=3) 


# Example multiple datasets where the first row has multiple changes in mean and the second row has
#no change in mean
set.seed(1)
x=c(rnorm(50,0,1),rnorm(50,5,1),rnorm(50,10,1),rnorm(50,3,1))
y=rnorm(200,0,1)
z=rbind(x,y)
cpt.mean(z,penalty="Asymptotic",pen.value=0.01,method="SegNeigh",Q=5,class=FALSE) # returns list
#that has two elements, the first has 3 changes in mean and variance at 50,100,150 and the second
#has no changes in variance
ans=cpt.mean(z,penalty="Asymptotic",pen.value=0.01,method="PELT") 
cpts(ans[[1]]) # same results as for the SegNeigh method.
cpts(ans[[2]]) # same results as for the SegNeigh method.

Example output

Loading required package: zoo

Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric

Successfully loaded changepoint package version 2.2.2
 NOTE: Predefined penalty values changed in version 2.2.  Previous penalty values with a postfix 1 i.e. SIC1 are now without i.e. SIC and previous penalties without a postfix i.e. SIC are now with a postfix 0 i.e. SIC0. See NEWS and help files for further details.
       cpt conf.value 
       100          1 
[1] 100
Class 'cpt' : Changepoint Object
       ~~   : S4 class containing 12 slots with names
              cpttype date version data.set method test.stat pen.type pen.value minseglen cpts ncpts.max param.est 

Created on  : Thu May  4 11:08:19 2017 

summary(.)  :
----------
Created Using changepoint version 2.2.2 
Changepoint type      : Change in mean 
Method of analysis    : AMOC 
Test Statistic  : CUSUM 
Type of penalty       : Manual with value, 0.8 
Minimum Segment Length :  
Maximum no. of cpts   : 1 
Changepoint Locations : 101 
Warning message:
In cpt.mean(x, penalty = "Manual", pen.value = 0.8, method = "AMOC",  :
  Traditional penalty values are not appropriate for the CUSUM test statistic
[1]  50 100 150 200
[1] "Maximum number of runs of algorithm = 9"
[1] "Completed runs = 2"
[1] "Completed runs = 3"
[1] "Completed runs = 5"
[1] "Completed runs = 7"
[1] "Completed runs = 8"
     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]   50   96  100  133  150  159  180
[2,]   50   96  100  133  150   NA   NA
[3,]   50  100  133  150   NA   NA   NA
[4,]   50  100  150   NA   NA   NA   NA
[5,]   50  150   NA   NA   NA   NA   NA
[6,]   50   NA   NA   NA   NA   NA   NA
[7,]   NA   NA   NA   NA   NA   NA   NA
[1]    4.000000    4.332496    4.385247    4.684254  559.366988  646.962719
[7] 1311.335695
[[1]]
[1]  50 100 150 200

[[2]]
[1] 200

Warning messages:
1: In cpt.mean(z, penalty = "Asymptotic", pen.value = 0.01, method = "SegNeigh",  :
  SegNeigh is computationally slow, use PELT instead
2: In penalty_decision(penalty, pen.value, n, diffparam, asymcheck = costfunc,  :
  Asymptotic penalty value is not accurate for multiple changes, it should be treated the same as a manual penalty choice.
Warning message:
In penalty_decision(penalty, pen.value, n, diffparam, asymcheck = costfunc,  :
  Asymptotic penalty value is not accurate for multiple changes, it should be treated the same as a manual penalty choice.
[1]  50 100 150
integer(0)

changepoint documentation built on May 2, 2019, 2:05 a.m.