# disscosangle: Functions to compute pair-wise distances In hopach: Hierarchical Ordered Partitioning and Collapsing Hybrid (HOPACH)

## Description

Given a matrix `X`, these functions compute the pair-wise distances between all variables (rows) in `X`, across all observations (columns) of `X`. Each function uses a different distance metric, i.e. definition of what it means for two variables to be similar. In hoapch version >=2.0.0, these functions return an object of class hdist rather than a matrix.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21``` ```disscosangle(X, na.rm = TRUE) disseuclid(X, na.rm = TRUE) disscor(X, na.rm = TRUE) dissabscosangle(X, na.rm = TRUE) dissabscor(X, na.rm = TRUE) vdisscosangle(X, y, na.rm = TRUE) vdisseuclid(X, y, na.rm = TRUE) vdisscor(X, y, na.rm = TRUE) vdissabscosangle(X, y, na.rm = TRUE) vdissabseuclid(X, y, na.rm = TRUE) vdissabscor(X, y, na.rm = TRUE) ```

## Arguments

 `X` A numeric data matrix. Each column corresponds to an observation, and each row corresponds to a variable. In the gene expression context, observations are arrays and variables are genes. All values must be numeric. Missing values are ignored. `na.rm` Indicator of whether to remove missing values (i.e. only compute distance over non-missing observations). `y` A numeric data vector of length `ncol(X)`.

## Details

Different choices of distance metric are discussed in the references. Briefly, Euclidean distance (`disseuclid`) defines two variables to be close if they are similar in magnitude across observations. Correlation distance (`disscor`), in contrast, defines similarity to mean having the same pattern, but not necessarily the same magnitude. Cosine-angle (`disscosangle`) distance is a correlation distance that also accounts for magnitude. Cosine-angle distance is also known as uncentered correlation distance. The distance metrics with 'abs' in their names are absolute versions of each metric; the absolute value is applied to the data before computing the distance.

In hopach versions <2.0.0, these functions returned the square root of the usual distance for `d="cosangle"`, `d="abscosangle"`, `d="cor"`, and `d="abscor"`. Typically, this transformation makes the dissimilarity correspond more closely with the norm. In order to agree with the `dist` function, the square root is no longer used in versions >=2.0.0.

## Value

For versions >= 2.0.0 `distancematrix`, a `hdist` object of of all pair wise distances between the rows of the data matrix 'X', i.e. the value of `hdist[i,j]` is the distance between rows 'i' and 'j' of 'X', as defined by 'd'. A `hdist` object is an S4 class containing four slots:

 `Data` representing the lower triangle of the symmetric distance matrix. `Size` the number of objects (i.e. rows of the data matrix). `Labels` labels for the objects, usually the numbers 1 to Size. `Call` the distance used in the call to `distancematrix`.

A hdist object and can be converted to a matrix using `as.matrix(hdist)`. (See `hdist` for more details.)

For the vector versions (e.g. `vdisscosangle`), a numeric vector of `nrow(X)` pair-wise distances between each variable (row) in `X` and the vector `y`.

## Author(s)

Katherine S. Pollard <[email protected]sf.edu> and Mark J. van der Laan <[email protected]>, with Greg Wall

## References

van der Laan, M.J. and Pollard, K.S. A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap. Journal of Statistical Planning and Inference, 2003, 117, pp. 275-303.

`distancematrix`
 ```1 2``` ```data<-matrix(rnorm(50),nr=5) disscosangle(data) ```