dataset_bert_pretrained: BERT Pretrained Dataset
In macmillancontentscience/torchtransformers: Transformer Models in Torch

dataset_bert_pretrained

R Documentation

BERT Pretrained Dataset

Description

Prepare a dataset for pretrained BERT models.

Usage

dataset_bert_pretrained(
  x,
  y = NULL,
  bert_type = NULL,
  tokenizer_scheme = NULL,
  n_tokens = NULL
)

Arguments

`x`	A data.frame with one or more character predictor columns, or a list, matrix, or character vector that can be coerced to such a data.frame.
`y`	A factor of outcomes, or a data.frame with a single factor column. Can be NULL (default).
`bert_type`	A bert_type from `available_berts()` to use to choose the other properties. If `bert_type` and `n_tokens` are set, they overrule this setting.
`tokenizer_scheme`	A character scalar that indicates vocabulary + tokenizer.
`n_tokens`	An integer scalar indicating the number of tokens in the output.

Value

An initialized torch::dataset(). If it is not yet tokenized, the tokenize() method must be called before the dataset will be usable.

Fields

input_data: (private) The input predictors (x) standardized to a data.frame of character columns, and outcome (y) standardized to a factor or NULL.
tokenizer_metadata: (private) A list indicating the tokenizer_scheme and n_tokens that have been or will be used to tokenize the predictors (x).
tokenized: (private) A single logical value indicating whether the data has been tokenized.

Methods

initialize: Initialize this dataset. This method is called when the dataset is first created.
tokenize: Tokenize this dataset.
untokenize: Remove any tokenization from this dataset.
.tokenize_for_model: Tokenize this dataset for a particular model. Generally superseded by instead calling luz_callback_bert_tokenize().
.getitem: Fetch an individual predictor (and, if available, the associated outcome). Generally superseded by instead calling .getbatch() (or by letting the luz modeling process fit automatically).
.getbatch: Fetch specific predictors (and, if available, the associated outcomes). This function is called automatically by {luz} during the fitting process.
.length: Determine the length of the dataset (the number of rows of predictors). Generally superseded by instead calling length().

macmillancontentscience/torchtransformers documentation built on Aug. 6, 2023, 5:35 a.m.

macmillancontentscience/torchtransformers index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

macmillancontentscience/torchtransformers
Transformer Models in Torch

dataset_bert_pretrained: BERT Pretrained Dataset
In macmillancontentscience/torchtransformers: Transformer Models in Torch

BERT Pretrained Dataset

Description

Usage

Arguments

Value

Fields

Methods

Related to dataset_bert_pretrained in macmillancontentscience/torchtransformers...

R Package Documentation

Browse R Packages

We want your feedback!

macmillancontentscience/torchtransformers Transformer Models in Torch

dataset_bert_pretrained: BERT Pretrained Dataset In macmillancontentscience/torchtransformers: Transformer Models in Torch

BERT Pretrained Dataset

Description

Usage

Arguments

Value

Fields

Methods

Related to dataset_bert_pretrained in macmillancontentscience/torchtransformers...

R Package Documentation

Browse R Packages

We want your feedback!

macmillancontentscience/torchtransformers
Transformer Models in Torch

dataset_bert_pretrained: BERT Pretrained Dataset
In macmillancontentscience/torchtransformers: Transformer Models in Torch