read.wordvectors: Read word vectors from a word2vec model from disk

Description Usage Arguments Value Examples

View source: R/word2vec.R

Description

Read word vectors from a word2vec model from disk into a dense matrix

Usage

1
2
3
4
5
6
7
read.wordvectors(
  file,
  type = c("bin", "txt"),
  n = .Machine$integer.max,
  normalize = FALSE,
  encoding = "UTF-8"
)

Arguments

file

the path to the model file

type

either 'bin' or 'txt' indicating the file is a binary file or a text file

n

integer, indicating to limit the number of words to read in. Defaults to reading all words.

normalize

logical indicating to normalize the embeddings by dividing by the factor (sqrt(sum(x . x) / length(x))). Defaults to FALSE.

encoding

encoding to be assumed for the words. Defaults to 'UTF-8'

Value

A matrix with the embeddings of the words. The rownames of the matrix are the words which are by default set to UTF-8 encoding.

Examples

1
2
3
4
5
6
7
8
9
path  <- system.file(package = "word2vec", "models", "example.bin")
embed <- read.wordvectors(path, type = "bin", n = 10)
embed <- read.wordvectors(path, type = "bin", n = 10, normalize = TRUE)
embed <- read.wordvectors(path, type = "bin")

path  <- system.file(package = "word2vec", "models", "example.txt")
embed <- read.wordvectors(path, type = "txt", n = 10)
embed <- read.wordvectors(path, type = "txt", n = 10, normalize = TRUE)
embed <- read.wordvectors(path, type = "txt")

Example output



word2vec documentation built on July 2, 2021, 5:07 p.m.