correlationordering: function to compute empirical correlation between distance in...
In hopach: Hierarchical Ordered Partitioning and Collapsing Hybrid (HOPACH)

Description Usage Arguments Details Value Warning Author(s) References See Also Examples

Given a matrix of pair wise distances based on a choice of distance metric, correlationordering computes the empirical correlation (over all pairs of elements) between the distance apart in the rows/columns of the matrix and the distance according to the metric. Correlation ordering will be high if elements close to each other in the matrix have small pair wise distances. If the rows/columns of the distance matrix are ordered according to a clustering of the elements, then correlation ordering should be large compared to a matrix with randomly ordered rows/columns.

1
2
3

correlationordering(dist)

improveordering(dist,echo=FALSE)

`dist`	matrix of all pair wise distances between a set of 'p' elements, as produced, for example, by the `distancematrix` function. The value in row 'j' and column 'i' is the distance between element 'i' and element 'j'. The matrix must be symmetric. The ordering of the rows/ columns is compared to the values in the matrix.
`echo`	indicator of whether the value of correlation ordering before and after rearranging the ordering should be printed.

Correlation ordering is defined as the empirical correlation between distance in a list and distance according to some other metric. The value in row 'i' and column 'j' of dist is compared to 'j-i'. The function correlationordering computes the correlation ordering for a matrix dist, whereas the function improveordering swaps the ordering of elements in dist until doing so no longer improves correlation ordering. The algorithm for improveordering is not optimized, so that the function can be quite slow for more than 50 elements. These functions are used by the hopach clustering function to sensibly order the clusters in the first level of the hierarchical tree, and can also be used to order elements within clusters when the number of elements is not too large.

For correlationordering, a number between -1 and 1, as returned by the cor function, equal to the correlation ordering for the matrix dist.

For improveordering, a vector of length 'p' containing the row indices for the new ordering of the rows/columns of dist, so that dist[improveordering(dist)] now has higher correlation ordering.

The function improveordering can be very slow for more than about 50 elements. The method employed is a greedy, step-wise algorithm, in which sequentially swaps all pairs of elements and accepts any swap that improves correlation ordering.

Katherine S. Pollard <kpollard@gladstone.ucsf.edu> and Mark J. van der Laan <laan@stat.berkeley.edu>

van der Laan, M.J. and Pollard, K.S. A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap. Journal of Statistical Planning and Inference, 2003, 117, pp. 275-303.

http://www.stat.berkeley.edu/~laan/Research/Research_subpages/Papers/hopach.pdf

distancematrix, hopach

mydata<-matrix(rnorm(50),nrow=10)
mydist<-distancematrix(mydata,d="euclid")
image(as.matrix(mydist))
correlationordering(mydist)
neword<-improveordering(mydist,echo=TRUE)
correlationordering(mydist[neword,neword])
image(as.matrix(mydist[neword,neword]))