trainer_wordpiece: WordPiece tokenizer trainer
In tok: Fast Text Tokenization

trainer_wordpiece

R Documentation

WordPiece tokenizer trainer

Description

WordPiece tokenizer trainer

Super class

tok::tok_trainer -> tok_trainer_wordpiece

Methods

Public methods

trainer_wordpiece$new()
trainer_wordpiece$clone()

Method `new()`

Constructor for the WordPiece tokenizer trainer

Usage

trainer_wordpiece$new(
  vocab_size = 30000,
  min_frequency = 0,
  show_progress = FALSE,
  special_tokens = NULL,
  limit_alphabet = NULL,
  initial_alphabet = NULL,
  continuing_subword_prefix = "##",
  end_of_word_suffix = NULL
)

Arguments

vocab_size: The size of the final vocabulary, including all tokens and alphabet. Default: NULL.
min_frequency: The minimum frequency a pair should have in order to be merged. Default: NULL.
show_progress: Whether to show progress bars while training. Default: TRUE.
special_tokens: A list of special tokens the model should be aware of. Default: NULL.
limit_alphabet: The maximum number of different characters to keep in the alphabet. Default: NULL.
initial_alphabet: A list of characters to include in the initial alphabet, even if not seen in the training dataset. If the strings contain more than one character, only the first one is kept. Default: NULL.
continuing_subword_prefix: A prefix to be used for every subword that is not a beginning-of-word. Default: NULL.
end_of_word_suffix: A suffix to be used for every subword that is an end-of-word. Default: NULL.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

trainer_wordpiece$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

tok
Fast Text Tokenization

trainer_wordpiece: WordPiece tokenizer trainer
In tok: Fast Text Tokenization

WordPiece tokenizer trainer

Description

Super class

Methods

Public methods

Method `new()`

Usage

Arguments

Method `clone()`

Usage

Arguments

See Also

Related to trainer_wordpiece in tok...

R Package Documentation

Browse R Packages

We want your feedback!

tok Fast Text Tokenization

trainer_wordpiece: WordPiece tokenizer trainer In tok: Fast Text Tokenization

WordPiece tokenizer trainer

Description

Super class

Methods

Public methods

Method new()

Usage

Arguments

Method clone()

Usage

Arguments

See Also

Related to trainer_wordpiece in tok...

R Package Documentation

Browse R Packages

We want your feedback!

tok
Fast Text Tokenization

trainer_wordpiece: WordPiece tokenizer trainer
In tok: Fast Text Tokenization

Method `new()`

Method `clone()`