View source: R/utils-textnets.R
doc_similarity | R Documentation |
Given a document-term matrix (DTM) this function returns the similarities between documents using a specified method (see details). The result is a square document-by-document similarity matrix (DSM), equivalent to a weighted adjacency matrix in network analysis.
doc_similarity(x, y = NULL, method, wv = NULL)
x |
Document-term matrix with terms as columns. |
y |
Optional second matrix (default = |
method |
Character vector indicating similarity method, including projection, cosine, wmd, and centroid (see Details). |
wv |
Matrix of word embedding vectors (a.k.a embedding model) with rows as words. Required for "wmd" and "centroid" similarities. |
Document similarity methods include:
projection: finds the one-mode projection matrix from the two-mode DTM
using tcrossprod()
which measures the shared vocabulary overlap
cosine: compares row vectors using cosine similarity
jaccard: compares proportion of common words to unique words in both documents
wmd: uses word mover's distance to compare documents (requires word embedding vectors)
centroid: represents each document as a centroid of their respective vocabulary, then uses cosine similarity to compare centroid vectors (requires word embedding vectors)
Dustin Stoltz
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.