View source: R/r-all-the-things.R
embed_wordspace | R Documentation |
Build a Starspace model which calculates word embeddings
embed_wordspace( x, model = "wordspace.bin", early_stopping = 0.75, useBytes = FALSE, ... )
x |
a character vector of text where tokens are separated by spaces |
model |
name of the model which will be saved, passed on to |
early_stopping |
the percentage of the data that will be used as training data. If set to a value smaller than 1, 1- |
useBytes |
set to TRUE to avoid re-encoding when writing out train and/or test files. See |
... |
further arguments passed on to |
an object of class textspace
as returned by starspace
.
library(udpipe) data(brussels_reviews, package = "udpipe") x <- subset(brussels_reviews, language == "nl") x <- strsplit(x$feedback, "\\W") x <- lapply(x, FUN = function(x) x[x != ""]) x <- sapply(x, FUN = function(x) paste(x, collapse = " ")) x <- tolower(x) set.seed(123456789) model <- embed_wordspace(x, early_stopping = 0.9, dim = 15, ws = 7, epoch = 10, minCount = 5, ngrams = 1, maxTrainTime = 2) ## maxTrainTime only set for CRAN plot(model) wordvectors <- as.matrix(model) mostsimilar <- embedding_similarity(wordvectors, wordvectors["weekend", ]) head(sort(mostsimilar[, 1], decreasing = TRUE), 10) mostsimilar <- embedding_similarity(wordvectors, wordvectors["vriendelijk", ]) head(sort(mostsimilar[, 1], decreasing = TRUE), 10) mostsimilar <- embedding_similarity(wordvectors, wordvectors["grote", ]) head(sort(mostsimilar[, 1], decreasing = TRUE), 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.