Build Status

ecomplex

R package for efficiently computing the epsilon-complexity coefficients of a time series. The coefficients are estimates of the complexity of a time series. The complexity coefficients are computed by finding a trend in the best approximation error as the series is successively downsampled and approximated.

An example

The complexity coefficients can be used as a feature in classification and clustering tasks. Here we generate two groups of time series, each group drawn from an ARMA(2,2) model with one parameter changed. A plot of the coefficients for each series shows the two groups are fairly well separated in the coefficient space.

library(ecomplex)
set.seed(1)
reps   <- 100; n <- 500
group1 <- replicate(reps, arima.sim(n = n, 
                              list(ar = c(0.89, -0.49), 
                                   ma = c(-0.23, 0.25))))
group2 <- replicate(reps, arima.sim(n = n, 
                              list(ar = c(0.69, -0.49),
                                   ma = c(-0.23, 0.25))))
ecomp1 <- apply(group1, 2, ecomplex)
ecomp2 <- apply(group2, 2, ecomplex)
coeffs <- lapply(c(ecomp1, ecomp2), function(fit) c(fit$A, fit$B))

df     <- data.frame(do.call(rbind, coeffs))
df$id  <- factor(rep(c(1,2), each = reps))
names(df) <- c('A', 'B', 'id')
palette(c('gray20', 'chocolate3'))
with(df, plot(B, A, col = id, lwd = 2))

An example from each time series group plotted on the same axis.

plot(c(group1[1:500,1]), xlim = c(0,1000), 
                         ylab = '', 
                         col = 'gray20', 
                         lwd = 1.2, 
                         type ='l')
lines(501:1000, group2[, 1], col = 'chocolate3', lwd = 1.2)
abline(v = c(500), lwd = 3, col = 'gray20')
palette('default')

The basic algorithm

The ecomplex function successively downsamples and approximates a time series. The coefficients are the parameters of a log-log regression of the set of approximation errors on the fraction of sample points retained for each approximation. Roughly, the coefficients measure the amount of information (in terms of sample points) needed to approximate a function within some error epsilon.

For mathematical details see Darkhovsky and Piryatinska, Epsilon-complexity of continuous functions.

Installation

# install.packages('devtools')
devtools::install_github('nwaff/ecomplex')

Future work

The package is in development but the ecomplex interface should be fairly stable. The ecomplex function computes the error on univariate time series but an option might be added to compute the epsilon-complexity coefficients for a multivariate time series.

The palarm function included in the package is a change point detection algorithm. There will likely be minor changes to return type of the function which makes optional the return of the means of each segment found by the algorithm.



nateaff/ecomplex documentation built on May 23, 2019, 9:03 p.m.