embed_docspace: Build a Starspace model for content-based recommendation
In ruimtehol: Learn Text 'Embeddings' with 'Starspace'

embed_docspace

R Documentation

Build a Starspace model for content-based recommendation

Description

Build a Starspace model for content-based recommendation (docspace). For example a user clicks on a webpage and this webpage contains a bunch or words.

Usage

embed_docspace(
  x,
  model = "docspace.bin",
  early_stopping = 0.75,
  useBytes = FALSE,
  ...
)

Arguments

`x`	a data.frame with user interest containing the columns user_id, doc_id and text The user_id is an identifier of a user The doc_id is just an article or document identifier the text column is a character field which contains words which are part of the doc_id, words should be separated by a space and should not contain any tab characters
`model`	name of the model which will be saved, passed on to `starspace`
`early_stopping`	the percentage of the data that will be used as training data. If set to a value smaller than 1, 1-`early_stopping` percentage of the data which will be used as the validation set and early stopping will be executed. Defaults to 0.75.
`useBytes`	set to TRUE to avoid re-encoding when writing out train and/or test files. See `writeLines` for details
`...`	further arguments passed on to `starspace` except file, trainMode and fileFormat

Value

an object of class textspace as returned by starspace.

Examples

library(udpipe)
data(dekamer, package = "ruimtehol")
data(dekamer_theme_terminology, package = "ruimtehol")
## Which person is interested in which theme (aka document)
x <- table(dekamer$aut_person, dekamer$question_theme_main)
x <- as.data.frame(x)
colnames(x) <- c("user_id", "doc_id", "freq")
## Characterise the themes (aka document)
docs <- split(dekamer_theme_terminology, dekamer_theme_terminology$theme)
docs <- lapply(docs, FUN=function(x){
  data.frame(theme = x$theme[1], text = paste(x$term, collapse = " "),
             stringsAsFactors=FALSE)
})
docs <- do.call(rbind, docs)

## Build a model
train <- merge(x, docs, by.x = "doc_id", by.y = "theme")
train <- subset(train, user_id %in% sample(levels(train$user_id), 4))
set.seed(123456789)
model <- embed_docspace(train, dim = 10, early_stopping = 1)
plot(model)

ruimtehol documentation built on May 29, 2024, 5:26 a.m.