High performance distances and similarities for various dense and sparse representations with primary focus on applications in NLP and recommender systems.
matrix from base RdgCMatrix, dgRMatrix and dgTMatrix from Matrix packagesimple_triplet_matrix from slam packagedata.frames in primary-secondary-value (psv) formatlist of named numeric or character vectors| | matrix | dgCMatrix | dgRMatrix | dgTMatrix | slam | psv | list |
| ---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| cosine | ✔ | ✔ | ✔ | ✔ | | ✔ | |
| euclidean | ✔ | ✔ | ✔ | ✔ | | ✔ | |
| mahalanobis | | | | | | | |
| jaccard | | | | | | | |
| | dgCMatrix | dgRMatrix | dgTMatrix | slam | psv | list |
| ---: | :---: | :---: | :---: | :---: | :---: | :---: |
| centroid | ✔ | ✔ | ✔ | | ✔ | |
| semantic_min_max1 | ✔ | ✔ | ✔ | | ✔ | |
| semantic_min_sum2 | ✔ | ✔ | ✔ | | ✔ | |
[1] More commonly known as "Relaxed Word Mover Distance" (RWMD) proposed in Kusner et. al. ‘From Word Embeddings To Document Distances’ (2015).
[2] Similar to RWMD measure, proposed in Mihalcea et.al. 'Corpus-Based and Knowledge-Based Measures of Text Semantic Similarity' (2006)
norm_l1, norm_l2.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.