RelaxedWordMoversDistance | R Documentation |
RWMD model can be used to query the "relaxed word movers distance" from a document to a collection of documents. RWMD tries to measure distance between query document and collection of documents by calculating how hard is to transform words from query document into words from each document in collection. For more detail see following article: http://mkusner.github.io/publications/WMD.pdf. However in contrast to the article above we calculate "easiness" of the convertion of one word into another by using cosine similarity (but not a euclidean distance). Also here in text2vec we've implemented effiient RWMD using the tricks from the Linear-Complexity Relaxed Word Mover's Distance with GPU Acceleration article https://arxiv.org/abs/1711.07227
RelaxedWordMoversDistance
RWMD
R6Class
object.
For usage details see Methods, Arguments and Examples sections.
rwmd = RelaxedWordMoversDistance$new(x, embeddings) rwmd$sim2(x)
$new(x, embeddings)
Constructor for RWMD model.
x
- docuent-term matrix which represents collection of
documents against which you want to perform queries. embeddings
-
matrix of word embeddings which will be used to calculate similarities
between words (each row represents a word vector).
$sim(x)
calculates similarity from a collection of documents
to collection query documents x
.
x
here is a document-term matrix which represents the set of query documents
$dist(x)
calculates distance from a collection of documents
to collection query documents x
x
here is a document-term matrix which represents the set of query documents
## Not run:
library(text2vec)
library(rsparse)
data("movie_review")
tokens = word_tokenizer(tolower(movie_review$review))
v = create_vocabulary(itoken(tokens))
v = prune_vocabulary(v, term_count_min = 5, doc_proportion_max = 0.5)
it = itoken(tokens)
vectorizer = vocab_vectorizer(v)
dtm = create_dtm(it, vectorizer)
tcm = create_tcm(it, vectorizer, skip_grams_window = 5)
glove_model = GloVe$new(rank = 50, x_max = 10)
wv = glove_model$fit_transform(tcm, n_iter = 5)
# get average of main and context vectors as proposed in GloVe paper
wv = wv + t(glove_model$components)
rwmd_model = RelaxedWordMoversDistance$new(dtm, wv)
rwms = rwmd_model$sim2(dtm[1:10, ])
head(sort(rwms[1, ], decreasing = T))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.