# diss.PRED: Dissimilarity Measure Based on Nonparametric Forecast In TSclust: Time Series Clustering Utilities

## Description

Computes the dissimilarity between two time series as the L1 distance between the kernel estimators of their forecast densities at a pre-specified horizon.

## Usage

 1 2 diss.PRED(x, y, h, B=500, logarithm.x=FALSE, logarithm.y=FALSE, differences.x=0, differences.y=0, plot=FALSE, models = NULL) 

## Arguments

 x Numeric vector containing the first of the two time series. y Numeric vector containing the second of the two time series. h The horizon of interest, i.e the number of steps-ahead where the prediction is evaluated. B The amount of bootstrap resamples. logarithm.x Boolean. Specifies whether to transform series x by taking logarithms or not. When using diss wrapper, use logarithms argument instead. See details. logarithm.y Boolean. Specifies whether to transform series y by taking logarithms or not. When using diss wrapper, use logarithms argument instead. See details. differences.x Specifies the amount of differences to apply to series x. When using diss wrapper, use differences argument instead. See details. differences.y Specifies the amount of differences to apply to series y. When using diss wrapper, use differences argument instead. See details. plot If TRUE, plot the resulting forecast densities. models A list containing either "ets", "arima" or a fitted model object from the forecast package. The list must have one element per series. In the case of the x and y version, a list with two elements. If models is not null logarithm and differences parameters are ignored.

## Details

The dissimilarity between the time series x and y is given by

d(x,y) = \int{ | f_{x,h}(u) - f_{y,h}(u) | du}

where d(x,y) = INT( | f_{x,h}(u) - f_{y,h}(u) )du and f_{y,h} are kernel density estimators of the forecast densities h-steps ahead of x and y, respectively. The horizon of interest h is pre-specified by the user. If models is specified, the given model for each series is used for obtaining the forecast densities. Currently, each element of the models list can be the string "ets", which will fit a ets model using the function ets in the forecast package. If the element of models is the string "arima", an ARIMA model using auto.arima from the forecast package will be used. Finally, the elements of models can be a fitted model on the series using a method from the forecast package which can be simulated, see link[forecast]{simulate.ets}. The kernel density estimators are based on B bootstrap replicates obtained by using a resampling procedure that mimics the generating processes, which are assumed to follow an arbitrary autoregressive structure (parametric or non-parametric). The procedure is completely detailed in Vilar et al. (2010). This function has high computational cost due to the bootstrapping procedure.

The procedure uses a bootstrap method that requires stationary time series. In order to support a wider range of time series, the method allows some transformations on the series before proceeding with the bootstrap resampling. This transformations are inverted before calculating the densities. The transformations allowed are logarithm and differenciation. The parameters logarithm.x, logarithm.y, differences.x, differences.y can be specified with this purpose.

If using diss function with "PRED" method, the argument logarithms must be used instead of logarithm.x and logarithm.y. logarithms is a boolean vector specifying if the logarithm transform should be taken for each one of the series. The argument differences, a numeric vector specifying the amount of differences to apply the series, is used instead of differences.x and differences.y. The plot is also different, showing all the densities in the same plot.

## Value

diss.PRED returns a list with the following components.

 L1dist The computed distance. dens.x A 2-column matrix with the density of predicion of series x. First column is the base (x) and the second column is the value (y) of the density. dens.y A 2-column matrix with the density of predicion of series y. First column is the base (x) and the second column is the value (y) of the density.

When used from the diss wrapper function, it returns a list with the following components.

 dist A dist object with the pairwise L1 distances between series. densities A list of 2-column matrices containing the densities of each series, in the same format as 'dens.x' or 'dens.y' of diss.PRED.

## Author(s)

José Antonio Vilar, Pablo Montero Manso.

## References

Alonso, A.M., Berrendero, J.R., Hernandez, A. and Justel, A. (2006) Time series clustering based on forecast densities. Comput. Statist. Data Anal., 51,762–776.

Vilar, J.A., Alonso, A. M. and Vilar, J.M. (2010) Non-linear time series clustering based on non-parametric forecast densities. Comput. Statist. Data Anal., 54 (11), 2850–2865.

Montero, P and Vilar, J.A. (2014) TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. http://www.jstatsoft.org/v62/i01/.

diss, link[forecast]{auto.arima}, link[forecast]{ets}, link[forecast]{simulate.ets}
  1 2 3 4 5 6 7 8 9 10 11 x <- (rnorm(100)) x <- x + abs(min(x)) + 1 #shift to produce values greater than 0, for a correct logarithm transform y <- (rnorm(100)) z <- sin(seq(0, pi, length.out=100)) ## Compute the distance and check for coherent results diss.PRED(x, y, h=6, logarithm.x=FALSE, logarithm.y=FALSE, differences.x=1, differences.y=0) #create a dist object for its use with clustering functions like pam or hclust diss( rbind(x,y,z), METHOD="PRED", h=3, B=200, logarithms=c(TRUE,FALSE, FALSE), differences=c(1,1,2) ) #test the forecast package predictions diss.PRED(x,y, h=5, models = list("ets", "arima"))