dataset_bert: BERT Dataset

dataset_bertR Documentation

BERT Dataset

Description

Prepare a dataset for BERT-like models.

Usage

dataset_bert(x, y = NULL, tokenizer = tokenize_bert, n_tokens = 128L)

Arguments

x

A data.frame with one or more character predictor columns.

y

A factor of outcomes, or a data.frame with a single factor column. Can be NULL (default).

tokenizer

A tokenization function (signature compatible with tokenize_bert).

n_tokens

Integer scalar; the number of tokens expected for each example.

Value

An initialized torch::dataset().

Methods

initialize

Initialize this dataset. This method is called when the dataset is first created.

.getitem

Fetch an individual predictor (and, if available, the associated outcome). This function is called automatically by {luz} during the fitting process.

.length

Determine the length of the dataset (the number of rows of predictors). Generally superseded by instead calling length().


macmillancontentscience/torchtransformers documentation built on Aug. 6, 2023, 5:35 a.m.