dot-tokenize_bert_single: Tokenize a single vector of text
In macmillancontentscience/torchtransformers: Transformer Models in Torch

.tokenize_bert_single

R Documentation

Tokenize a single vector of text

Description

Tokenize a single vector of text

Usage

.tokenize_bert_single(
  text,
  n_tokens = 64L,
  increment_index = TRUE,
  pad_token = "[PAD]",
  cls_token = "[CLS]",
  sep_token = "[SEP]",
  tokenizer = wordpiece::wordpiece_tokenize,
  vocab = wordpiece.data::wordpiece_vocab(),
  tokenizer_options = NULL
)

Arguments

`text`	A character vector, or a list of length-1 character vectors.
`n_tokens`	Integer scalar; the number of tokens expected for each example.
`increment_index`	Logical; if TRUE, add 1L to all token ids to convert from the Python-inspired 0-indexed standard to the torch 1-indexed standard.
`pad_token`	Character scalar; the token to use for padding. Must be present in the supplied vocabulary.
`cls_token`	Character scalar; the token to use at the start of each example. Must be present in the supplied vocabulary, or `NULL`.
`sep_token`	Character scalar; the token to use at the end of each segment within each example. Must be present in the supplied vocabulary, or `NULL`.
`tokenizer`	The tokenizer function to use to break up the text. It must have a `vocab` argument.
`vocab`	The vocabulary to use to tokenize the text. This vocabulary must include the `pad_token, cls_token, and sep_token`.
`tokenizer_options`	A named list of additional arguments to pass on to the tokenizer.

Value

An object of class "bert_tokens", which is a list containing a matrix of token ids, a matrix of token type ids, and a matrix of token names.

macmillancontentscience/torchtransformers documentation built on Aug. 6, 2023, 5:35 a.m.

macmillancontentscience/torchtransformers index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

macmillancontentscience/torchtransformers
Transformer Models in Torch

dot-tokenize_bert_single: Tokenize a single vector of text
In macmillancontentscience/torchtransformers: Transformer Models in Torch

Tokenize a single vector of text

Description

Usage

Arguments

Value

Related to dot-tokenize_bert_single in macmillancontentscience/torchtransformers...

R Package Documentation

Browse R Packages

We want your feedback!

macmillancontentscience/torchtransformers Transformer Models in Torch

dot-tokenize_bert_single: Tokenize a single vector of text In macmillancontentscience/torchtransformers: Transformer Models in Torch

Tokenize a single vector of text

Description

Usage

Arguments

Value

Related to dot-tokenize_bert_single in macmillancontentscience/torchtransformers...

R Package Documentation

Browse R Packages

We want your feedback!

macmillancontentscience/torchtransformers
Transformer Models in Torch

dot-tokenize_bert_single: Tokenize a single vector of text
In macmillancontentscience/torchtransformers: Transformer Models in Torch