as.matrix.paragraph2vec: Get the document or word vectors of a paragraph2vec model

Description Usage Arguments Value See Also Examples

View source: R/paragraph2vec.R

Description

Get the document or word vectors of a paragraph2vec model as a dense matrix.

Usage

1
2
3
4
5
6
7
8
## S3 method for class 'paragraph2vec'
as.matrix(
  x,
  which = c("docs", "words"),
  normalize = TRUE,
  encoding = "UTF-8",
  ...
)

Arguments

x

a paragraph2vec model as returned by paragraph2vec or read.paragraph2vec

which

either one of 'docs' or 'words'

normalize

logical indicating to normalize the embeddings. Defaults to TRUE.

encoding

set the encoding of the row names to the specified encoding. Defaults to 'UTF-8'.

...

not used

Value

a matrix with the document or word vectors where the rownames are the documents or words upon which the model was trained

See Also

paragraph2vec, read.paragraph2vec

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
library(tokenizers.bpe)
data(belgium_parliament, package = "tokenizers.bpe")
x <- subset(belgium_parliament, language %in% "french")
x <- subset(x, nchar(text) > 0 & txt_count_words(text) < 1000)

model <- paragraph2vec(x = x, type = "PV-DM",   dim = 15,  iter = 5)

model <- paragraph2vec(x = x, type = "PV-DBOW", dim = 100, iter = 20)


embedding <- as.matrix(model, which = "docs")
embedding <- as.matrix(model, which = "words")
embedding <- as.matrix(model, which = "docs", normalize = FALSE)
embedding <- as.matrix(model, which = "words", normalize = FALSE)

doc2vec documentation built on March 28, 2021, 1:09 a.m.