README.md

Build Status CRAN RStudio mirror downloads Development version CRAN version

High performance distances and similarities for various dense and sparse representations with primary focus on applications in NLP and recommender systems.

Supported and Planned Object Types

Distances for 2D Representations

| | matrix | dgCMatrix | dgRMatrix | dgTMatrix | slam | psv | list | | ---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | cosine | ✔ | ✔ | ✔ | ✔ | | ✔ | | | euclidean | ✔ | ✔ | ✔ | ✔ | | ✔ | | | mahalanobis | | | | | | | | | jaccard | | | | | | | |

Aggregation Distances for 3D Representations

| | dgCMatrix | dgRMatrix | dgTMatrix | slam | psv | list | | ---: | :---: | :---: | :---: | :---: | :---: | :---: | | centroid | ✔ | ✔ | ✔ | | ✔ | | | semantic_min_max1 | ✔ | ✔ | ✔ | | ✔ | | | semantic_min_sum2 | ✔ | ✔ | ✔ | | ✔ | |

[1] More commonly known as "Relaxed Word Mover Distance" (RWMD) proposed in Kusner et. al. ‘From Word Embeddings To Document Distances’ (2015).

[2] Similar to RWMD measure, proposed in Mihalcea et.al. 'Corpus-Based and Knowledge-Based Measures of Text Semantic Similarity' (2006)

Transformations

norm_l1, norm_l2.



vspinu/simdist documentation built on May 3, 2019, 7:09 p.m.