correlationordering: function to compute empirical correlation between distance in... In hopach: Hierarchical Ordered Partitioning and Collapsing Hybrid (HOPACH)

Description

Given a matrix of pair wise distances based on a choice of distance metric, `correlationordering` computes the empirical correlation (over all pairs of elements) between the distance apart in the rows/columns of the matrix and the distance according to the metric. Correlation ordering will be high if elements close to each other in the matrix have small pair wise distances. If the rows/columns of the distance matrix are ordered according to a clustering of the elements, then correlation ordering should be large compared to a matrix with randomly ordered rows/columns.

Usage

 ```1 2 3``` ```correlationordering(dist) improveordering(dist,echo=FALSE) ```

Arguments

 `dist` matrix of all pair wise distances between a set of 'p' elements, as produced, for example, by the `distancematrix` function. The value in row 'j' and column 'i' is the distance between element 'i' and element 'j'. The matrix must be symmetric. The ordering of the rows/ columns is compared to the values in the matrix. `echo` indicator of whether the value of correlation ordering before and after rearranging the ordering should be printed.

Details

Correlation ordering is defined as the empirical correlation between distance in a list and distance according to some other metric. The value in row 'i' and column 'j' of `dist` is compared to 'j-i'. The function `correlationordering` computes the correlation ordering for a matrix `dist`, whereas the function `improveordering` swaps the ordering of elements in `dist` until doing so no longer improves correlation ordering. The algorithm for `improveordering` is not optimized, so that the function can be quite slow for more than 50 elements. These functions are used by the `hopach` clustering function to sensibly order the clusters in the first level of the hierarchical tree, and can also be used to order elements within clusters when the number of elements is not too large.

Value

For `correlationordering`, a number between -1 and 1, as returned by the `cor` function, equal to the correlation ordering for the matrix `dist`.

For `improveordering`, a vector of length 'p' containing the row indices for the new ordering of the rows/columns of `dist`, so that dist[improveordering(dist)] now has higher correlation ordering.

Warning

The function `improveordering` can be very slow for more than about 50 elements. The method employed is a greedy, step-wise algorithm, in which sequentially swaps all pairs of elements and accepts any swap that improves correlation ordering.

Author(s)

Katherine S. Pollard <[email protected]> and Mark J. van der Laan <[email protected]>

References

van der Laan, M.J. and Pollard, K.S. A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap. Journal of Statistical Planning and Inference, 2003, 117, pp. 275-303.

`distancematrix`, `hopach`
 ```1 2 3 4 5 6 7``` ```mydata<-matrix(rnorm(50),nrow=10) mydist<-distancematrix(mydata,d="euclid") image(as.matrix(mydist)) correlationordering(mydist) neword<-improveordering(mydist,echo=TRUE) correlationordering(mydist[neword,neword]) image(as.matrix(mydist[neword,neword])) ```