Relaxed word movers distance tries to measure distance between documents by calculating how hard is to transform words from first document into words from second document and vice versa. For more detail see original article: http://mkusner.github.io/publications/WMD.pdf.
1 2 3
logical = TRUE whether to display progressbar
For usage details see Methods, Arguments and Examples sections.
1 2 3
$new(wv, method = c("cosine", "euclidean"))
Constructor for RWMD model For description of arguments see Arguments section
Computes distance between each row of sparse matrix
x and each row of sparse matrix
Computes "parallel" distance between rows of
x and corresponding rows of the sparse matrix
x sparse document term matrix
y = NULL sparse document term matrix.
y = NULL (as by default), we will assume
y = x
word vectors. Numeric matrix which contains word embeddings. Rows - words, columns - corresponding vectors. Rows should have word names.
name of the distance for measuring similarity between two word vectors.
In original paper authors use
however we use
"cosine" by default (better from our experience).
distance = 1 - cosine_angle_betwen_wv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## Not run: data("movie_review") tokens = word_tokenizer(tolower(movie_review$review)) v = create_vocabulary(itoken(tokens)) v = prune_vocabulary(v, term_count_min = 5, doc_proportion_max = 0.5) it = itoken(tokens) vectorizer = vocab_vectorizer(v) dtm = create_dtm(it, vectorizer) tcm = create_tcm(it, vectorizer, skip_grams_window = 5) glove_model = GloVe$new(word_vectors_size = 50, vocabulary = v, x_max = 10) wv = glove_model$fit_transform(tcm, n_iter = 10) # get average of main and context vectors as proposed in GloVe paper wv = wv + t(glove_model$components) rwmd_model = RWMD$new(wv) rwmd_dist = dist2(dtm[1:100, ], dtm[1:10, ], method = rwmd_model, norm = 'none') head(rwmd_dist) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.