pdclust: Permutation Distribution Clustering
In pdc: Permutation Distribution Clustering

Description Usage Arguments Details Value Author(s) References See Also Examples

Hierarchical cluster analysis for time series. Similarity of time series is based on the similarity of their permutation distributions.

pdclust(X, m = NULL, t = NULL, divergence =
                 symmetricAlphaDivergence, clustering.method =
                 "complete")



## S3 method for class 'pdclust'
plot(x, labels=NULL, type="rectangle", cols="black",
	timeseries.as.labels = T, p.values=F, ...)



## S3 method for class 'pdclust'
str(object, ...)


## S3 method for class 'pdclust'
print(x, ...)

`X`	In the univariate case: A matrix representing a set of time series. Columns represent different time series and rows represent time. In the multivariate case: A three-dimensional matrix with the first dimension representing time, second dimension representing multivariate time series, and the third dimension representing variables.
`m`	Embedding dimension for calculating the permutation distributions. Reasonable values range somewhere between 2 and 10. If no embedding dimension is chosen, the MinE heuristic is used to determine the embedding dimension automatically.
`t`	Time-delay of the embedding.
`divergence`	Divergence measure between discrete distributions. Default is the symmetric alpha divergence.
`clustering.method`	Hierarchical clustering linkage method. One out of c("complete","average","single").

For plotting:

`x`	A `pdclust` object
`labels`	Optionally provide a vector of labels for the time series here.
`type`	One of c("triangle","rectangle") to choose the dendrogram style.
`cols`	Specify line color either as string or as vector of strings
`timeseries.as.labels`	If `FALSE`, a vertical dendrogram is plotted using hclust. If `TRUE`, a horizontal dendrogram is plotted with time series plots as labels.
`p.values`	Annotation of the cluster hierarchy with p values
`...`	Further graphical arguments.

For string representation:

object

A pdclust object

The function pdclust is the central function for clustering time-series in the package pdc. It allows clustering of univariate and multivariate time-series. If time-series have different length, the shorter time-series can be padded with NAs to bring them to columns of the same length in an array or a matrix. Multivariate time-series can also be handled by pdclust. Therefore, the data must be transformed into a three-dimensional matrix with the dimenions representing (1) time, (2) entities, and (3) variables/channels.

Calls to pdclust return a pdclust object. There are print, str and plot methods for pdclust objects.

Andreas Brandmaier brandmaier@mpib-berlin.mpg.de

Brandmaier, A. M. (2015). pdc: An R Package for Complexity-Based Clustering of Time Series. Journal of Statistical Software, 67(5), 1–23.
Brandmaier, A. M. (2012). Permutation Distribution Clustering and Structural Equation Model Trees. Doctoral dissertation. Saarland University, Saarbruecken, Germany.

pdcDist entropyHeuristic symmetricAlphaDivergence

# generate 5 ARMA time series for the first group
grp1 <- replicate(5, arima.sim(n = 500, list(ar = c(0.8897, -0.4858), 
		ma = c(-0.2279, 0.2488)),
          	sd = sqrt(0.1796)) )
          
# generate 5 ARMA time series for the second group
grp2 <- replicate(5, arima.sim(n = 500, list(ar = c(-0.71, 0.18), 
		ma = c(0.92, 0.14)),
          	sd = sqrt(0.291)) )
          
# combine groups into a single dataset
X <- cbind(grp1,grp2)

# run clustering and color original groups each in red and blue
clustering <- pdclust(X)
plot(clustering, cols=c(rep("red",5),rep("blue",5)))