| dataset_bert_pretrained | R Documentation |
Prepare a dataset for pretrained BERT models.
dataset_bert_pretrained(
x,
y = NULL,
bert_type = NULL,
tokenizer_scheme = NULL,
n_tokens = NULL
)
x |
A data.frame with one or more character predictor columns, or a list, matrix, or character vector that can be coerced to such a data.frame. |
y |
A factor of outcomes, or a data.frame with a single factor column. Can be NULL (default). |
bert_type |
A bert_type from |
tokenizer_scheme |
A character scalar that indicates vocabulary + tokenizer. |
n_tokens |
An integer scalar indicating the number of tokens in the output. |
An initialized torch::dataset(). If it is not yet tokenized, the
tokenize() method must be called before the dataset will be usable.
input_data(private) The input predictors (x) standardized to
a data.frame of character columns, and outcome (y) standardized to a
factor or NULL.
tokenizer_metadata(private) A list indicating the
tokenizer_scheme and n_tokens that have been or will be used to
tokenize the predictors (x).
tokenized(private) A single logical value indicating whether
the data has been tokenized.
initializeInitialize this dataset. This method is called when the dataset is first created.
tokenizeTokenize this dataset.
untokenizeRemove any tokenization from this dataset.
.tokenize_for_modelTokenize this dataset for a particular model.
Generally superseded by instead calling luz_callback_bert_tokenize().
.getitemFetch an individual predictor (and, if available, the
associated outcome). Generally superseded by instead calling .getbatch()
(or by letting the luz modeling process fit automatically).
.getbatchFetch specific predictors (and, if available, the
associated outcomes). This function is called automatically by {luz}
during the fitting process.
.lengthDetermine the length of the dataset (the number of rows of
predictors). Generally superseded by instead calling length().
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.