distSparse: Sparse distance matrix calculations

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/dist.R


Sparse alternative to base dist function. WARNING: the result is not a distance metric, see details! Also: distances are calculated between columns (not between rows, as in the base dist function).


distSparse(M, method = "euclidean", diag = FALSE)



a sparse matrix in a format of the Matrix package, typically dMatrix. Any other matrices will be converted to such a sparse Matrix. The correlations will be calculated between the columns of this matrix (different from the base dist function!)


method to calculate distances. Currently only "euclidean" is supported.


should the diagonal be included in the results?


A sparse distance matrix is a slightly awkward concept, because distances of zero are rare in most data. Further, it is mostly the small distances that are of interest, and not the large distanes (which are mostly also less trustwhorthy). Note that for random data, this assumption is not necessarily true.

To obtain sparse results, the current implementation takes a special approach. First, only those distances will be calculated for which there is at least some non-zero data for both columns. The assumption is taken that those distances will be uninteresting (and relatively large anyway).

Second, to differentiate the non-calculated distances from real zero distances, the distances are converted into similarities by substracting them from the maximum. In this way, all non-calculated distances are zero, and the real zeros have value max(M).

Euclidean distances are calculated using the following trick:

colSums(M^2) + rowSums(M^2) - 2 * M'M


A symmetric matrix of type dsCMatrix, consisting of similarity(!) values instead of distances (viz. max(dist)-dist).


Please note:


Michael Cysouw <cysouw@mac.com

See Also

See Also as dist.


# to be done

Example output

Loading required package: Matrix
Loading required package: slam

qlcMatrix documentation built on May 2, 2019, 9:14 a.m.