# distSparse: Sparse distance matrix calculations In qlcMatrix: Utility Sparse Matrix Functions for Quantitative Language Comparison

## Description

Sparse alternative to base `dist` function. WARNING: the result is not a distance metric, see details! Also: distances are calculated between columns (not between rows, as in the base `dist` function).

## Usage

 `1` ```distSparse(M, method = "euclidean", diag = FALSE) ```

## Arguments

 `M` a sparse matrix in a format of the `Matrix` package, typically `dMatrix`. Any other matrices will be converted to such a sparse Matrix. The correlations will be calculated between the columns of this matrix (different from the base `dist` function!) `method` method to calculate distances. Currently only `"euclidean"` is supported. `diag` should the diagonal be included in the results?

## Details

A sparse distance matrix is a slightly awkward concept, because distances of zero are rare in most data. Further, it is mostly the small distances that are of interest, and not the large distanes (which are mostly also less trustwhorthy). Note that for random data, this assumption is not necessarily true.

To obtain sparse results, the current implementation takes a special approach. First, only those distances will be calculated for which there is at least some non-zero data for both columns. The assumption is taken that those distances will be uninteresting (and relatively large anyway).

Second, to differentiate the non-calculated distances from real zero distances, the distances are converted into similarities by substracting them from the maximum. In this way, all non-calculated distances are zero, and the real zeros have value `max(M)`.

Euclidean distances are calculated using the following trick:

colSums(M^2) + rowSums(M^2) - 2 * M'M

## Value

A symmetric matrix of type `dsCMatrix`, consisting of similarity(!) values instead of distances (viz. `max(dist)-dist`).

## Note

• The values in the result are not distances, but similarities computed as `max(dist)-dist`.

• Non-calculated values are zero.

## Author(s)

Michael Cysouw <cysouw@mac.com

See Also as `dist`.

## Examples

 `1` ```# to be done ```

### Example output

```Loading required package: Matrix