luz_callback_bert_tokenize: BERT Tokenization Callback

View source: R/luz_callbacks.R

luz_callback_bert_tokenizeR Documentation

BERT Tokenization Callback

Description

Data used in pretrained BERT models must be tokenized in the way the model expects. This luz_callback checks that the incoming data is tokenized properly, and triggers tokenization if necessary. This function should be passed to luz::fit.luz_module_generator() or luz::predict.luz_module_fitted() via the callbacks argument, not called directly.

Usage

luz_callback_bert_tokenize(
  submodel_name = NULL,
  n_tokens = NULL,
  verbose = TRUE
)

Arguments

submodel_name

An optional character scalar identifying a model inside the main torch::nn_module() that was built using model_bert_pretrained(). See vignette("entailment") for an example of a model with a submodel.

n_tokens

An optional integer scalar indicating the number of tokens to which the data should be tokenized. If present it must be equal to or less than the max_tokens allowed by the pretrained model.

verbose

A logical scalar indicating whether the callback should report its progress (default TRUE).

Examples

if (rlang::is_installed("luz")) {
  luz_callback_bert_tokenize()
  luz_callback_bert_tokenize(n_tokens = 32L)
}


macmillancontentscience/torchtransformers documentation built on Aug. 6, 2023, 5:35 a.m.