correlationordering: function to compute empirical correlation between distance in...

Description Usage Arguments Details Value Warning Author(s) References See Also Examples

View source: R/distance.R

Description

Given a matrix of pair wise distances based on a choice of distance metric, correlationordering computes the empirical correlation (over all pairs of elements) between the distance apart in the rows/columns of the matrix and the distance according to the metric. Correlation ordering will be high if elements close to each other in the matrix have small pair wise distances. If the rows/columns of the distance matrix are ordered according to a clustering of the elements, then correlation ordering should be large compared to a matrix with randomly ordered rows/columns.

Usage

1
2
3

Arguments

dist

matrix of all pair wise distances between a set of 'p' elements, as produced, for example, by the distancematrix function. The value in row 'j' and column 'i' is the distance between element 'i' and element 'j'. The matrix must be symmetric. The ordering of the rows/ columns is compared to the values in the matrix.

echo

indicator of whether the value of correlation ordering before and after rearranging the ordering should be printed.

Details

Correlation ordering is defined as the empirical correlation between distance in a list and distance according to some other metric. The value in row 'i' and column 'j' of dist is compared to 'j-i'. The function correlationordering computes the correlation ordering for a matrix dist, whereas the function improveordering swaps the ordering of elements in dist until doing so no longer improves correlation ordering. The algorithm for improveordering is not optimized, so that the function can be quite slow for more than 50 elements. These functions are used by the hopach clustering function to sensibly order the clusters in the first level of the hierarchical tree, and can also be used to order elements within clusters when the number of elements is not too large.

Value

For correlationordering, a number between -1 and 1, as returned by the cor function, equal to the correlation ordering for the matrix dist.

For improveordering, a vector of length 'p' containing the row indices for the new ordering of the rows/columns of dist, so that dist[improveordering(dist)] now has higher correlation ordering.

Warning

The function improveordering can be very slow for more than about 50 elements. The method employed is a greedy, step-wise algorithm, in which sequentially swaps all pairs of elements and accepts any swap that improves correlation ordering.

Author(s)

Katherine S. Pollard <kpollard@gladstone.ucsf.edu> and Mark J. van der Laan <laan@stat.berkeley.edu>

References

van der Laan, M.J. and Pollard, K.S. A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap. Journal of Statistical Planning and Inference, 2003, 117, pp. 275-303.

http://www.stat.berkeley.edu/~laan/Research/Research_subpages/Papers/hopach.pdf

See Also

distancematrix, hopach

Examples

1
2
3
4
5
6
7
mydata<-matrix(rnorm(50),nrow=10)
mydist<-distancematrix(mydata,d="euclid")
image(as.matrix(mydist))
correlationordering(mydist)
neword<-improveordering(mydist,echo=TRUE)
correlationordering(mydist[neword,neword])
image(as.matrix(mydist[neword,neword]))

hopach documentation built on Nov. 8, 2020, 4:54 p.m.