predict.word2vec: Predict functionalities for a word2vec model
In word2vec: Distributed Representations of Words

View source: R/word2vec.R

predict.word2vec

R Documentation

Predict functionalities for a word2vec model

Description

Get either

the embedding of words
the nearest words which are similar to either a word or a word vector

Usage

## S3 method for class 'word2vec'
predict(
  object,
  newdata,
  type = c("nearest", "embedding"),
  top_n = 10L,
  encoding = "UTF-8",
  ...
)

Arguments

`object`	a word2vec model as returned by `word2vec` or `read.word2vec`
`newdata`	for type 'embedding', `newdata` should be a character vector of words for type 'nearest', `newdata` should be a character vector of words or a matrix in the embedding space
`type`	either 'embedding' or 'nearest'. Defaults to 'nearest'.
`top_n`	show only the top n nearest neighbours. Defaults to 10.
`encoding`	set the encoding of the text elements to the specified encoding. Defaults to 'UTF-8'.
`...`	not used

Value

depending on the type, you get a different result back:

for type nearest: a list of data.frames with columns term, similarity and rank indicating with words which are closest to the provided newdata words or word vectors. If newdata is just one vector instead of a matrix, it returns a data.frame
for type embedding: a matrix of word vectors of the words provided in newdata

Examples

path  <- system.file(package = "word2vec", "models", "example.bin")
model <- read.word2vec(path)
emb <- predict(model, c("bus", "toilet", "unknownword"), type = "embedding")
emb
nn  <- predict(model, c("bus", "toilet"), type = "nearest", top_n = 5)
nn

# Do some calculations with the vectors and find similar terms to these
emb <- as.matrix(model)
vector <- emb["buurt", ] - emb["rustige", ] + emb["restaurants", ]
predict(model, vector, type = "nearest", top_n = 10)

vector <- emb["gastvrouw", ] - emb["gastvrij", ]
predict(model, vector, type = "nearest", top_n = 5)

vectors <- emb[c("gastheer", "gastvrouw"), ]
vectors <- rbind(vectors, avg = colMeans(vectors))
predict(model, vectors, type = "nearest", top_n = 10)

word2vec documentation built on Oct. 8, 2023, 1:07 a.m.