Distance Matrix Computation

Description

This function computes and returns the distance matrix determined by using the specified distance metric to compute the distances between the columns of a data matrix.

Usage

1
distanceMatrix(dataset, metric, ...)

Arguments

dataset

A numeric matrix or an ExpressionSet

metric

A character string defining the distance metric. This can be pearson, sqrt pearson, spearman, absolute pearson, uncentered correlation, weird or any of the metrics accepted by the dist function. At present, the latter function accepts euclidean, maximum, manhattan, canberra, binary, or minkowski. Any initial substring that uniquely defines one of the metrics will work.

...

Additional parameters to be passed on to dist.

Details

This function differs from dist in two ways, both of which are motivated by common practice in the analysis of microarray or proteomics data. First, it computes distances between column vectors instead of between row vectors. In a typical microarray experiment, the data is organized so the rows represent genes and the columns represent different biological samples. In many applications, relations between the biological samples are more interesting than relationships between genes. Second, distanceMatrix adds additional distance metrics based on correlation.

  • pearsonThe most common metric used in the microarray literature is the pearson distance, which can be computed in terms of the Pearson correlation coefficient as (1-cor(dataset))/2.

  • uncentered correlationThis metric was introduced in the Cluster and TreeView software from the Eisen lab at Stanford. It is computed using the formulas for Pearson correlation, but assuming that both vectors have mean zero.

  • spearmanThe spearman metric used the same formula, but substitutes the Spearman rank correlation for the Pearson correlation.

  • absolute pearsonThe absolute pearson metric used the absolute correlation coefficient; i.e., (1-abs(cor(dataset))).

  • sqrt pearsonThe sqrt pearson metric used the square root of the pearson distance metric; i.e., sqrt(1-cor(dataset)).

  • weirdThe weird metric uses the Euclidean distance between the vectors of correlation coefficients; i.e., dist(cor(dataset)).

Value

A distance matrix in the form of an object of class dist, of the sort returned by the dist function or the as.dist function.

BUGS

It would be good to have a better name for the weird metric.

Author(s)

Kevin R. Coombes krc@silicovore.com

See Also

dist, as.dist

Examples

1
2
3
4
5
6
dd <- matrix(rnorm(100*5, rnorm(100)), nrow=100, ncol=5)
distanceMatrix(dd, 'pearson')
distanceMatrix(dd, 'euclid')
distanceMatrix(dd, 'sqrt')
distanceMatrix(dd, 'weird')
rm(dd) # cleanup