embed_wordspace: Build a Starspace model which calculates word embeddings
In ruimtehol: Learn Text 'Embeddings' with 'Starspace'

embed_wordspace

R Documentation

Build a Starspace model which calculates word embeddings

Description

Build a Starspace model which calculates word embeddings

Usage

embed_wordspace(
  x,
  model = "wordspace.bin",
  early_stopping = 0.75,
  useBytes = FALSE,
  ...
)

Arguments

`x`	a character vector of text where tokens are separated by spaces
`model`	name of the model which will be saved, passed on to `starspace`
`early_stopping`	the percentage of the data that will be used as training data. If set to a value smaller than 1, 1-`early_stopping` percentage of the data which will be used as the validation set and early stopping will be executed. Defaults to 0.75.
`useBytes`	set to TRUE to avoid re-encoding when writing out train and/or test files. See `writeLines` for details
`...`	further arguments passed on to `starspace` except file, trainMode and fileFormat

Value

an object of class textspace as returned by starspace.

Examples


library(udpipe)
data(brussels_reviews, package = "udpipe")
x <- subset(brussels_reviews, language == "nl")
x <- strsplit(x$feedback, "\\W")
x <- lapply(x, FUN = function(x) x[x != ""])
x <- sapply(x, FUN = function(x) paste(x, collapse = " "))
x <- tolower(x)

set.seed(123456789)
model <- embed_wordspace(x, early_stopping = 0.9,
                         dim = 15, ws = 7, epoch = 10, minCount = 5, ngrams = 1,
                         maxTrainTime = 2) ## maxTrainTime only set for CRAN
plot(model)
wordvectors <- as.matrix(model)

mostsimilar <- embedding_similarity(wordvectors, wordvectors["weekend", ])
head(sort(mostsimilar[, 1], decreasing = TRUE), 10)
mostsimilar <- embedding_similarity(wordvectors, wordvectors["vriendelijk", ])
head(sort(mostsimilar[, 1], decreasing = TRUE), 10)
mostsimilar <- embedding_similarity(wordvectors, wordvectors["grote", ])
head(sort(mostsimilar[, 1], decreasing = TRUE), 10)

ruimtehol documentation built on May 29, 2024, 5:26 a.m.