# SAX: Symbolic Aggregate Aproximation related functions In TSclust: Time Series Clustering Utilities

## Description

`diss.MINDIST.SAX` computes a dissimilarity that lower bounds the Euclidean on the discretized, dimensionality reduced series. Function `PAA` produces the dimension reduction. Function `convert.to.SAX.symbol` produces the discretization.

## Usage

 ```1 2 3 4 5``` ```diss.MINDIST.SAX(x, y, w, alpha=4, plot=FALSE) PAA(x, w) convert.to.SAX.symbol(x, alpha) MINDIST.SAX(x, y, alpha, n) SAX.plot(series, w, alpha, col.ser=rainbow(ncol(as.matrix(series)))) ```

## Arguments

 `x` Numeric vector containing the first of the two time series. `y` Numeric vector containing the second of the two time series. `w` The amount of equal sized frames that the series will be reduced to. `alpha` The size of the alphabet, the amount of symbols used to represents the values of the series. `plot` If `TRUE`, plot a graphic of the reduced series, with their corresponding symbols. `n` The original size of the series. `series` A `ts` or `mts` object with the series to plot. `col.ser` Colors for the series. One per series.

## Details

SAX is a symbolic representation of continuous time series.

`w` must be an integer but it does not need to divide the length of the series. If `w` divides the length of the series, the `diss.MINDIST.SAX` plot uses this to show the size of the frames.

`PAA` performs the Piecewise Aggregate Approximation of the series, reducing it to `w` elements, called frames. Each frame is composed by n/w observations of the original series, averaged. Observations are weighted when `w` does not divide `n`.

`convert.to.SAX.symbol` performs SAX discretization: Discretizes the series `x` to an alphabet of size `alpha`, `x` should be z-normalized in this case. The N(0,1) distribution is divided in `alpha` equal probability parts, if an observation falls into the ith part (starting from minus infinity), it is assigned the i symbol.

`MINDIST.SAX` calculates the MINDIST dissimilarity between symbolic representations.

`diss.MINDIST.SAX` combines the previous procedures to compute a dissimilarity between series. The series are z-normalized at first. Then the dimensionality is reduced uusin `PAA` to produce series of length `w`. The series are discretized to an alphabet of size `alpha` using `convert.to.SAX.symbol`. Finally the dissimilarity value is produced using `MINDIST.SAX`.

`SAX.plot` produces a plot of the SAX representation of the given `series`.

## Value

The computed dissimilarity.

## Author(s)

Pablo Montero Manso, José Antonio Vilar.

## References

Lin, J., Keogh, E., Lonardi, S. & Chiu, B. (2003) A Symbolic Representation of Time Series, with Implications for Streaming Algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

Keogh, E., Chakrabarti, K., Pazzani, M., & Mehrotra, S. (2001). Dimensionality reduction for fast similarity search in large time series databases. Knowledge and information Systems, 3(3), 263-286.

Montero, P and Vilar, J.A. (2014) TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. http://www.jstatsoft.org/v62/i01/.

`diss`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38``` ```set.seed(12349) n = 100 x <- rnorm(n) #generate sample series, white noise and a wiener process y <- cumsum(rnorm(n)) w <- 20 #amount of equal-sized frames to divide the series, parameters for PAA alpha <- 4 #size of the alphabet, parameter for SAX #normalize x <- (x - mean(x)) /sd(x) y <- (y - mean(y)) /sd(y) paax <- PAA(x, w) #generate PAA reductions paay <- PAA(y, w) plot(x, type="l", main="PAA reduction of series x") #plot an example of PAA reduction p <- rep(paax,each=length(x)/length(paax)) #just for plotting the PAA lines(p, col="red") #repeat the example with y plot(y, type="l", main="PAA reduction of series y") py <- rep(paay,each=length(y)/length(paay)) lines(py, col="blue") #convert to SAX representation SAXx <- convert.to.SAX.symbol( paax, alpha) SAXy <- convert.to.SAX.symbol( paay, alpha) #CALC THE SAX DISTANCE MINDIST.SAX(SAXx, SAXy, alpha, n) #this whole process can be computed using diss.MINDIST.SAX diss.MINDIST.SAX(x, y, w, alpha, plot=TRUE) z <- rnorm(n)^2 diss(rbind(x,y,z), "MINDIST.SAX", w, alpha) SAX.plot( as.ts(cbind(x,y,z)), w=w, alpha=alpha) ```