dataset_bert_pretrained: BERT Pretrained Dataset

dataset_bert_pretrainedR Documentation

BERT Pretrained Dataset

Description

Prepare a dataset for pretrained BERT models.

Usage

dataset_bert_pretrained(
  x,
  y = NULL,
  bert_type = NULL,
  tokenizer_scheme = NULL,
  n_tokens = NULL
)

Arguments

x

A data.frame with one or more character predictor columns, or a list, matrix, or character vector that can be coerced to such a data.frame.

y

A factor of outcomes, or a data.frame with a single factor column. Can be NULL (default).

bert_type

A bert_type from available_berts() to use to choose the other properties. If bert_type and n_tokens are set, they overrule this setting.

tokenizer_scheme

A character scalar that indicates vocabulary + tokenizer.

n_tokens

An integer scalar indicating the number of tokens in the output.

Value

An initialized torch::dataset(). If it is not yet tokenized, the tokenize() method must be called before the dataset will be usable.

Fields

input_data

(private) The input predictors (x) standardized to a data.frame of character columns, and outcome (y) standardized to a factor or NULL.

tokenizer_metadata

(private) A list indicating the tokenizer_scheme and n_tokens that have been or will be used to tokenize the predictors (x).

tokenized

(private) A single logical value indicating whether the data has been tokenized.

Methods

initialize

Initialize this dataset. This method is called when the dataset is first created.

tokenize

Tokenize this dataset.

untokenize

Remove any tokenization from this dataset.

.tokenize_for_model

Tokenize this dataset for a particular model. Generally superseded by instead calling luz_callback_bert_tokenize().

.getitem

Fetch an individual predictor (and, if available, the associated outcome). Generally superseded by instead calling .getbatch() (or by letting the luz modeling process fit automatically).

.getbatch

Fetch specific predictors (and, if available, the associated outcomes). This function is called automatically by {luz} during the fitting process.

.length

Determine the length of the dataset (the number of rows of predictors). Generally superseded by instead calling length().


macmillancontentscience/torchtransformers documentation built on Aug. 6, 2023, 5:35 a.m.