as_word2vec: Convert a matrix of word vectors to word2vec format

View source: R/udpipe_train.R

as_word2vecR Documentation

Convert a matrix of word vectors to word2vec format

Description

The word2vec format provides in the first line the dimension of the word vectors and in the following lines one has the elements of the wordvector where each line covers one word or token.

The function is basically a utility function which allows one to write wordvectors created with other R packages in the well-known word2vec format which is used by udpipe_train to train the dependency parser.

Usage

as_word2vec(x)

Arguments

x

a matrix with word vectors where the rownames indicate the word or token and the number of columns of the matrix indicate the side of the word vector

Value

a character string of length 1 containing the word vectors in word2vec format which can be written to a file on disk

Examples

wordvectors <- matrix(rnorm(1000), nrow = 100, ncol = 10)
rownames(wordvectors) <- sprintf("word%s", seq_len(nrow(wordvectors)))
wv <- as_word2vec(wordvectors)
cat(wv)

f <- file(tempfile(fileext = ".txt"), encoding = "UTF-8")
cat(wv, file = f)
close(f)

udpipe documentation built on Jan. 6, 2023, 5:06 p.m.