dataset_bert_pretrained | R Documentation |
Prepare a dataset for pretrained BERT models.
dataset_bert_pretrained(
x,
y = NULL,
bert_type = NULL,
tokenizer_scheme = NULL,
n_tokens = NULL
)
x |
A data.frame with one or more character predictor columns, or a list, matrix, or character vector that can be coerced to such a data.frame. |
y |
A factor of outcomes, or a data.frame with a single factor column. Can be NULL (default). |
bert_type |
A bert_type from |
tokenizer_scheme |
A character scalar that indicates vocabulary + tokenizer. |
n_tokens |
An integer scalar indicating the number of tokens in the output. |
An initialized torch::dataset()
. If it is not yet tokenized, the
tokenize()
method must be called before the dataset will be usable.
input_data
(private)
The input predictors (x
) standardized to
a data.frame of character columns, and outcome (y
) standardized to a
factor or NULL
.
tokenizer_metadata
(private)
A list indicating the
tokenizer_scheme
and n_tokens
that have been or will be used to
tokenize the predictors (x
).
tokenized
(private)
A single logical value indicating whether
the data has been tokenized.
initialize
Initialize this dataset. This method is called when the dataset is first created.
tokenize
Tokenize this dataset.
untokenize
Remove any tokenization from this dataset.
.tokenize_for_model
Tokenize this dataset for a particular model.
Generally superseded by instead calling luz_callback_bert_tokenize()
.
.getitem
Fetch an individual predictor (and, if available, the
associated outcome). Generally superseded by instead calling .getbatch()
(or by letting the luz modeling process fit automatically).
.getbatch
Fetch specific predictors (and, if available, the
associated outcomes). This function is called automatically by {luz}
during the fitting process.
.length
Determine the length of the dataset (the number of rows of
predictors). Generally superseded by instead calling length()
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.