knitr::opts_chunk$set(echo = TRUE)
This vignette serves as a quick guide on how to use the 'offlineChange' R package to detect multiple change points from offline data, based on the multi-window approach proposed by Ding et al.
Suppose there is a sequence of one-dimentional dependent data, and the true number of change points is unknown, then the following function can be used to return the ranges that are most likely to contain the true change points. A list of window sizes (window_list) is specified by the user. Each window size should be less than the number of observations. The purpose of these windows, as explained in the reference paper, are used to divide the original data into segments so that the data can be transformed. The largest number of changes (point_max) should be specified by the user as well.
library(offlineChange) # Data N <- 1000 N1 <- floor(0.1*N) N2 <- floor(0.3*N) a1 <- c(0.8, -0.3); c1 <- 0 a2 <- c(-0.5, 0.1); c2 <- 0 a3 <- c(0.5, -0.5); c3 <- 0 y <- rep(0,N) L<-2 y[1:L] <- rnorm(L) for (n in (L+1):N){ if (n <= N1) { y[n] <- y[(n-1):(n-L)] %*% a1 + c1 + rnorm(1) } else if (n <= (N1+N2)) { y[n] <- y[(n-1):(n-L)] %*% a2 + c2 + rnorm(1) } else { y[n] <- y[(n-1):(n-L)] %*% a3 + c3 + rnorm(1) } } result <- MultiWindow(y, window_list = c(100,50,20,10,5), point_max = 5)
The returned results contain the number of peak ranges and location of peak ranges:
result$n_peak_range result$peak_range
The users can also modify some other default arguments:
result <- MultiWindow(y, window_list = c(100, 50, 20, 10, 5), point_max = 5, prior_range = NULL, get_mle = GetMle, penalty = c("bic","aic"), seg_min = 1, num_init = NULL, tolerance = 1, cpp = TRUE, ret_score = FALSE )
The argument cpp indicates whether to increase computation speed using rcpp. The acceleration only works without prior range. Based on the peak ranges returned, users can use the following function to get exact change points:
result <- MultiWindow(y, window_list = c(100,50,20,10,5), point_max = 5) RangeToPoint(y, n_peak_range = result$n_peak_range, peak_range = result$peak_range)
Based on the peak ranges returned, users can also plot them on original data. Blue line and red line represent the start and end of a peak range.
result <- MultiWindow(y, window_list = c(100,50,20,10,5), point_max = 5) ChangePointsPlot(y,result,main="plot of change points")
Or, users can plot the score that represents how likely the range contains change points. The higher score a range has, the more likely the range contains change point. The argument ret_score must be TRUE inorder to get score plot.
result <- MultiWindow(y, window_list = c(100,50,20,10,5), point_max = 5, ret_score=TRUE) ScorePlot(result, main="score plot")
The default transformation method from dependent data to independent data is OLS estimation of AR coefficients. Users can define their own transformation functions.
install.packages('MHadaptive') library('MHadaptive') gen_EFdata <- function(N){ q <- 4 Ns <- c(floor(0.2*N), floor(0.3*N), N-floor(0.2*N)-floor(0.3*N)) thetas <- c(0.2,0.6,1) X <- c() for (k in 1:3){ loglik <- function( x ){ res <- - thetas[k] * abs(x)^q } mh <- Metro_Hastings(loglik, rgamma(n = 1, shape = 3, rate = 1), prop_sigma = NULL, par_names = NULL, iterations = 11000, burn_in = 1000, adapt_par = c(100, 20, 0.5, 0.75), quiet = TRUE) mh0 <- mcmc_thin(mh, thin = 20) X <- c(X, mh0$trace[1:Ns[k]]) } return(X) } GetHle=function(x,window_size) { q <- 4 N <- length(x) n_window <- ceiling(N/window_size) x_transformed <- rep(0,n_window) for (n in 1:n_window) { #get estimated coefficients xx <- x[(1+(n-1)*window_size):min(n*window_size,N)] x_transformed[n] <- q * (q-1) * sum(abs(xx)^(q-2)) / sum(q^2 * abs(xx)^(2*q-2)) } return(x_transformed) } N <- 1000 X <- gen_EFdata(N) result <- MultiWindow(X, window_list=c(100,80,50,30, 20), get_mle=GetHle, point_max=3, seg_min=1, tolerance=1)
Besides transformation function, users can also define peanlty terms they like. Be careful that, the penalty function is actually the sum of loss and penalty.
toy_penatly=function(num_each, wgss, D, K, N) { penalty <- sum(num_each * log(wgss/num_each)) + 2 * D * K * log(N) return(penalty) } result <- MultiWindow(y, window_list=c(100,80,50,30, 20), penalty="toy_penalty", point_max=3, seg_min=1, tolerance=1)
Suppose there is a sequence of one-dimentional dependent data, and the user already know prior ranges that change points should be within. Then the following function can be used to return the (narrower) ranges that are most likely to contain the true change points, with much less computational cost compared with the previous method.
result <- MultiWindow(y, window_list = c(100,50,20,10,5), prior_range = list(c(30,200),c(220,400)))
And the users can also change some default arguments. In particular, 'get_mle' is the data transformation method that can be provided by users. Typically, it is chosen to be the function that input data and output an estimate of unknown parameters assuming that the data is generated from a parametric family.
result <- MultiWindow(y, window_list = c(100, 50, 20, 10, 5), prior_range = list(c(30,200), c(220,400)), get_mle = GetMle, tolerance = 1)
Based on the peak ranges returned, users can use the following function to get exact change points:
result <- MultiWindow(y, window_list = c(100,50,20,10,5), prior_range = list(c(30,200), c(220,400))) RangeToPoint(y, n_peak_range = result$n_peak_range, peak_range = result$peak_range)
Suppose there is a sequence of independent data with dimension L+1, and the true number of change points is unknown, then the following function can be used to return the change points. Like before, the largest number of changes (point_max) should be specified by the user.
# Data a <- matrix(rnorm(40,mean=-1,sd=1), nrow=20, ncol=2) b <- matrix(rnorm(120,mean=0,sd=1), nrow=60, ncol=2) c <- matrix(rnorm(40,mean=1,sd=1), nrow=20, ncol=2) x <- rbind(a,b,c) result <- ChangePoints(x, point_max = 5)
And the users can also modify some default arguments:
result <- ChangePoints(x, point_max = 5, penalty = c("bic","aic"), seg_min = 1, num_init = NULL, cpp = TRUE)
Suppose there is a sequence of independent multivariate data, and the true number of change points is already known, then the following function can be used to return the change points. Here, K is the number of change points instead of the number of segments.
result <- OrderKmeans(x, K = 2)
And the users can also change the suggested setting of arguments:
result <- OrderKmeans(x, K = 2, num_init=NULL)
Suppose there is a sequence of independent multivariate data, and the user already know prior ranges that change points should be within, then the following function can be used to return change points, and the prior knowledge of ranges of change points can make the cumputation more efficient and the return change points more accurate.
l1 <- c(15,25) l2 <- c(75,100) prior_range_x <- list(l1, l2) result <- PriorRangeOrderKmeans(x, prior_range_x = list(l1,l2))
And the users can also modify some default arguments:
result <- PriorRangeOrderKmeans(x, prior_range_x, num_init=NULL)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.