Processing and analyzing time series datasets have become a central issue in many domains requiring data management systems to support time series as a data type natively. A crucial prerequisite of these systems is time series matching, yet it is a challenging problem. A time series is a high-dimensional data type, its representation is storage-, and its comparison is time-consuming. Among the representation techniques that tackle these challenges, the symbolic aggregate approximation (SAX) is of particular interest. This technique reduces a time series in a low-dimensional space by segmenting it and discretizing each segment into a small alphabet of symbols. However, SAX ignores the deterministic behavior of a time series, such as its cyclical repeated season affecting all segments and leading to a distortion of the symbolic distribution. In this paper, we present a season-aware symbolic approximation. We show that it improves a representation's symbolic distribution and increases the representation accuracy without increasing the representation size. Most importantly, it enables a more efficient time series matching by providing a match up to three orders of magnitude faster than SAX.
The idxrepr package contains the representation techniques used by the paper Season-aware Symbolic Time Series Matching. Subsequently we explain the main components.
The package contains the following representations techniques:
The file manager.R contains all methods that are needed for representing a time series:
Example:
# Two sample time series
series <- rep(seq(12), 11) + rnorm(132)
series <- (series - mean(series)) / sd(series)
series_2 <- rep(seq(-1, -12), 11) + rnorm(132)
series_2 <- (series_2 - mean(series_2)) / sd(series_2)
# Initialize seassaxres (sSAX), default configuration: T = 132, W = 6, A_res = A_seas = 3
method <- mgr_init("seassaxres")
# Set alphabets A_res = A_seas = 256 and update lookup tables
method$seassax$sax <- mgr_set_config(method$seassax$sax, list(a = 2**8))
method$sax <- mgr_set_config(method$sax, list(a = 2**8))
method <- set_config(method, list())
# Represent time series
repr <- mgr_represent(method, series)
repr
> [1] 21 24 41 59 82 106 149 170 180 216 232 240 129 118 139 126 127 126
repr_2 <- mgr_represent(method, series_2)
# Calculate distance
mgr_distance(method, repr, repr_2)
> [1] 21.74766
# Return symbols of season mask
mgr_det_symbols(method, repr)
> [1] 21 24 41 59 82 106 149 170 180 216 232 240
# Return symbols of residuals
mgr_res_symbols(method, repr)
> [1] 129 118 139 126 127 126
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.