View source: R/r-all-the-things.R
embed_articlespace | R Documentation |
Build a Starspace model for learning the mapping between sentences and articles (articlespace)
embed_articlespace( x, model = "articlespace.bin", early_stopping = 0.75, useBytes = FALSE, ... )
x |
a data.frame with sentences containing the columns doc_id, sentence_id and token The doc_id is just an article or document identifier, the sentence_id column is a character field which contains words which are separated by a space and should not contain any tab characters |
model |
name of the model which will be saved, passed on to |
early_stopping |
the percentage of the data that will be used as training data. If set to a value smaller than 1, 1- |
useBytes |
set to TRUE to avoid re-encoding when writing out train and/or test files. See |
... |
further arguments passed on to |
an object of class textspace
as returned by starspace
.
library(udpipe) data(brussels_reviews_anno, package = "udpipe") x <- subset(brussels_reviews_anno, language == "nl") x$token <- x$lemma x <- x[, c("doc_id", "sentence_id", "token")] set.seed(123456789) model <- embed_articlespace(x, early_stopping = 1, dim = 25, epoch = 25, minCount = 2, negSearchLimit = 1, maxNegSamples = 2) plot(model) sentences <- c("ook de keuken zijn zeer goed uitgerust .", "het appartement zijn met veel smaak inrichten en zeer proper .") predict(model, sentences, type = "embedding") starspace_embedding(model, sentences) ## Not run: library(udpipe) data(dekamer, package = "ruimtehol") dekamer <- subset(dekamer, question_theme_main == "DEFENSIEBELEID") x <- udpipe(dekamer$question, "dutch", tagger = "none", parser = "none", trace = 100) x <- x[, c("doc_id", "sentence_id", "sentence", "token")] set.seed(123456789) model <- embed_articlespace(x, early_stopping = 0.8, dim = 15, epoch = 5, minCount = 5) plot(model) embeddings <- starspace_embedding(model, unique(x$sentence), type = "document") dim(embeddings) sentence <- "Wat zijn de cijfers qua doorstroming van 2016?" embedding_sentence <- starspace_embedding(model, sentence, type = "document") mostsimilar <- embedding_similarity(embeddings, embedding_sentence) head(sort(mostsimilar[, 1], decreasing = TRUE), 3) ## clean up for cran file.remove(list.files(pattern = ".udpipe$")) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.