most_similar | R Documentation |
Find the Top-N most similar words, which replicates the results produced
by the Python gensim
module most_similar()
function.
(Exact replication of gensim
requires the same word vectors data,
not the demodata
used here in examples.)
most_similar(
data,
x = NULL,
topn = 10,
above = NULL,
keep = FALSE,
row.id = TRUE,
verbose = TRUE
)
data |
A |
x |
Can be:
|
topn |
Top-N most similar words. Defaults to |
above |
Defaults to
If both |
keep |
Keep words specified in |
row.id |
Return the row number of each word? Defaults to |
verbose |
Print information to the console? Defaults to |
A data.table
with the most similar words and their cosine similarities.
Download pre-trained word vectors data (.RData
):
https://psychbruce.github.io/WordVector_RData.pdf
sum_wordvec
dict_expand
dict_reliability
cosine_similarity
pair_similarity
plot_similarity
tab_similarity
d = as_embed(demodata, normalize=TRUE)
most_similar(d)
most_similar(d, "China")
most_similar(d, c("king", "queen"))
most_similar(d, cc(" king , queen ; man | woman "))
# the same as above:
most_similar(d, ~ China)
most_similar(d, ~ king + queen)
most_similar(d, ~ king + queen + man + woman)
most_similar(d, ~ boy - he + she)
most_similar(d, ~ Jack - he + she)
most_similar(d, ~ Rose - she + he)
most_similar(d, ~ king - man + woman)
most_similar(d, ~ Tokyo - Japan + China)
most_similar(d, ~ Beijing - China + Japan)
most_similar(d, "China", above=0.7)
most_similar(d, "China", above="Shanghai")
# automatically normalized for more accurate results
ms = most_similar(demodata, ~ king - man + woman)
ms
str(ms)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.