predict.word2vec | R Documentation |
Get either
the embedding of words
the nearest words which are similar to either a word or a word vector
## S3 method for class 'word2vec'
predict(
object,
newdata,
type = c("nearest", "embedding"),
top_n = 10L,
encoding = "UTF-8",
...
)
object |
a word2vec model as returned by |
newdata |
for type 'embedding', |
type |
either 'embedding' or 'nearest'. Defaults to 'nearest'. |
top_n |
show only the top n nearest neighbours. Defaults to 10. |
encoding |
set the encoding of the text elements to the specified encoding. Defaults to 'UTF-8'. |
... |
not used |
depending on the type, you get a different result back:
for type nearest: a list of data.frames with columns term, similarity and rank indicating with words which are closest to the provided newdata
words or word vectors. If newdata
is just one vector instead of a matrix, it returns a data.frame
for type embedding: a matrix of word vectors of the words provided in newdata
word2vec
, read.word2vec
path <- system.file(package = "word2vec", "models", "example.bin")
model <- read.word2vec(path)
emb <- predict(model, c("bus", "toilet", "unknownword"), type = "embedding")
emb
nn <- predict(model, c("bus", "toilet"), type = "nearest", top_n = 5)
nn
# Do some calculations with the vectors and find similar terms to these
emb <- as.matrix(model)
vector <- emb["buurt", ] - emb["rustige", ] + emb["restaurants", ]
predict(model, vector, type = "nearest", top_n = 10)
vector <- emb["gastvrouw", ] - emb["gastvrij", ]
predict(model, vector, type = "nearest", top_n = 5)
vectors <- emb[c("gastheer", "gastvrouw"), ]
vectors <- rbind(vectors, avg = colMeans(vectors))
predict(model, vectors, type = "nearest", top_n = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.