bpe_load_model: Load a Byte Pair Encoding model
In tokenizers.bpe: Byte Pair Encoding Text Tokenization

View source: R/youtokentome.R

bpe_load_model

R Documentation

Load a Byte Pair Encoding model

Description

Load a Byte Pair Encoding model trained with bpe

Usage

bpe_load_model(file, threads = -1L)

Arguments

`file`	path to the model
`threads`	integer with number of CPU threads to use for model processing. If equal to -1 then minimum of the number of available threads and 8 will be used

Value

an object of class youtokentome which is a list with elements

model: an Rcpp pointer to the model
model_path: the path to the model
threads: the threads argument
vocab_size: the size of the BPE vocabulary
vocabulary: the BPE vocabulary with is a data.frame with columns id and subword

Examples

## Reload a model
path  <- system.file(package = "tokenizers.bpe", "extdata", "youtokentome.bpe")
model <- bpe_load_model(path)

## Build a model and load it again

data(belgium_parliament, package = "tokenizers.bpe")
x <- subset(belgium_parliament, language == "french")
model <- bpe(x$text, coverage = 0.999, vocab_size = 5000, threads = 1)
model <- bpe_load_model(model$model_path, threads = 1)

## Remove the model file (Clean up for CRAN)
file.remove(model$model_path)

tokenizers.bpe documentation built on Sept. 16, 2023, 1:06 a.m.