diss.PRED: Dissimilarity Measure Based on Nonparametric Forecast In TSclust: Time Series Clustering Utilities

Description

Computes the dissimilarity between two time series as the L1 distance between the kernel estimators of their forecast densities at a pre-specified horizon.

Usage

 1 2 diss.PRED(x, y, h, B=500, logarithm.x=FALSE, logarithm.y=FALSE, differences.x=0, differences.y=0, plot=FALSE) 

Arguments

 x Numeric vector containing the first of the two time series. y Numeric vector containing the second of the two time series. h The horizon of interest, i.e the number of steps-ahead where the prediction is evaluated. B The amount of bootstrap resamples. logarithm.x Boolean. Specifies whether to transform series x by taking logarithms or not. When using diss wrapper, use logarithms argument instead. See details. logarithm.y Boolean. Specifies whether to transform series y by taking logarithms or not. When using diss wrapper, use logarithms argument instead. See details. differences.x Specifies the amount of differences to apply to series x. When using diss wrapper, use differences argument instead. See details. differences.y Specifies the amount of differences to apply to series y. When using diss wrapper, use differences argument instead. See details. plot If TRUE, plot the resulting forecast densities.

Details

The dissimilarity between the time series x and y is given by

d(x,y) = \int{ | f_{x,h}(u) - f_{y,h}(u) | du}

where d(x,y) = INT( | f_{x,h}(u) - f_{y,h}(u) )du and f_{y,h} are kernel density estimators of the forecast densities h-steps ahead of x and y, respectively. The horizon of interest h is pre-specified by the user. The kernel density estimators are based on B bootstrap replicates obtained by using a resampling procedure that mimics the generating processes, which are assumed to follow an arbitrary autoregressive structure (parametric or non-parametric). The procedure is completely detailed in Vilar et al. (2010). This function has high computational cost due to the bootstrapping procedure.

The procedure uses a bootstrap method that requires stationary time series. In order to support a wider range of time series, the method allows some transformations on the series before proceeding with the bootstrap resampling. This transformations are inverted before calculating the densities. The transformations allowed are logarithm and differenciation. The parameters logarithm.x, logarithm.y, differences.x, differences.y can be specified with this purpose.

If using diss function with "PRED" method, the argument logarithms must be used instead of logarithm.x and logarithm.y. logarithms is a boolean vector specifying if the logarithm transform should be taken for each one of the series. The argument differences, a numeric vector specifying the amount of differences to apply the series, is used instead of differences.x and differences.y. The plot is also different, showing all the densities in the same plot.

Value

diss.PRED returns a list with the following components.

 L1dist The computed distance. dens.x A 2-column matrix with the density of predicion of series x. First column is the base (x) and the second column is the value (y) of the density. dens.y A 2-column matrix with the density of predicion of series y. First column is the base (x) and the second column is the value (y) of the density.

When used from the diss wrapper function, it returns a list with the following components.

 dist A dist object with the pairwise L1 distances between series. densities A list of 2-column matrices containing the densities of each series, in the same format as 'dens.x' or 'dens.y' of diss.PRED.

Author(s)

José Antonio Vilar, Pablo Montero Manso.

References

Alonso, A.M., Berrendero, J.R., Hernandez, A. and Justel, A. (2006) Time series clustering based on forecast densities. Comput. Statist. Data Anal., 51,762–776.

Vilar, J.A., Alonso, A. M. and Vilar, J.M. (2010) Non-linear time series clustering based on non-parametric forecast densities. Comput. Statist. Data Anal., 54 (11), 2850–2865.

Montero, P and Vilar, J.A. (2014) TSclust: An R Package for Time Series Clustering. Journal of Statistical Software, 62(1), 1-43. http://www.jstatsoft.org/v62/i01/.

diss
 1 2 3 4 5 6 7 8 9 x <- (rnorm(100)) x <- x + abs(min(x)) + 1 #shift to produce values greater than 0, for a correct logarithm transform y <- (rnorm(100)) z <- sin(seq(0, pi, length.out=100)) ## Compute the distance and check for coherent results diss.PRED(x, y, h=6, logarithm.x=FALSE, logarithm.y=FALSE, differences.x=1, differences.y=0) #create a dist object for its use with clustering functions like pam or hclust diss( rbind(x,y,z), METHOD="PRED", h=3, B=200, logarithms=c(TRUE,FALSE, FALSE), differences=c(1,1,2) )