starspace_save_model: Save a starspace model as a binary or tab-delimited TSV file

View source: R/embed-all-the-things.R

starspace_save_modelR Documentation

Save a starspace model as a binary or tab-delimited TSV file

Description

Save a starspace model as a binary or a tab-delimited TSV file

Usage

starspace_save_model(
  object,
  file = "textspace.ruimtehol",
  method = c("ruimtehol", "tsv-data.table", "binary", "tsv-starspace"),
  labels = data.frame(code = character(), label = character(), stringsAsFactors =
    FALSE)
)

Arguments

object

an object of class textspace as returned by starspace or starspace_load_model

file

character string with the path to the file where to save the model

method

character indicating the method of saving. Possible values are 'ruimtehol', 'binary', 'tsv-starspace' and 'tsv-data.table'. Defaults to 'ruimtehol'.

  • The first method: 'ruimtehol' saves the R object and the embeddings and optionally the label definitions with saveRDS. This object can be loaded back in with starspace_load_model.

  • The second method: 'tsv-data.table' saves the model embeddings as a tab-delimited flat file using the fast data.table fwrite function

  • The third method: 'binary' saves the model as a binary file using the original methods of the Starspace authors

  • The fourth method: 'tsv-starspace' saves the model as a tab-delimited flat file using the original methods of the Starspace authors

labels

a data.frame with at least columns code and label which will be saved in case method is set to 'ruimtehol'. This allows to store the mapping between Starspace labels and your own codes alongside the model, where code is your internal code and label is your label.
A new column will be added to this data.frame called label_starspace which combines the Starspace prefix of the label with the code column of your provided data.frame, as this combination is the label starspace uses internally.

Value

invisibly, the character string with the file of the saved object

Note

It is advised to always use method 'ruimtehol' method as it works nicely together with the starspace_load_model function. It is the advised method unless you need to provide non-R users the models and you prefer using the methods provided by the Starspace authors instead of the faster and more portable 'ruimtehol' method.

See Also

starspace_load_model

Examples

data(dekamer, package = "ruimtehol")
dekamer$text <- strsplit(dekamer$question, "\\W")
dekamer$text <- lapply(dekamer$text, FUN = function(x) x[x != ""])
dekamer$text <- sapply(dekamer$text, 
                       FUN = function(x) paste(x, collapse = " "))

dekamer$target <- as.factor(dekamer$question_theme_main)
codes <- data.frame(code = seq_along(levels(dekamer$target)), 
                    label = levels(dekamer$target), stringsAsFactors = FALSE)
dekamer$target <- as.integer(dekamer$target)
set.seed(123456789)
model <- embed_tagspace(x = dekamer$text, 
                        y = dekamer$target, 
                        early_stopping = 0.8,
                        dim = 10, minCount = 5)
starspace_save_model(model, file = "textspace.ruimtehol", method = "ruimtehol",
                     labels = codes)
model <- starspace_load_model("textspace.ruimtehol", method = "ruimtehol")
starspace_save_model(model, file = "embeddings.tsv", method = "tsv-data.table")

## clean up for cran
file.remove("textspace.ruimtehol")
file.remove("embeddings.tsv")

ruimtehol documentation built on Jan. 7, 2023, 1:25 a.m.