trainer_bpe | R Documentation |
BPE trainer
BPE trainer
tok::tok_trainer
-> tok_trainer_bpe
new()
Constrcutor for the BPE trainer
trainer_bpe$new( vocab_size = NULL, min_frequency = NULL, show_progress = NULL, special_tokens = NULL, limit_alphabet = NULL, initial_alphabet = NULL, continuing_subword_prefix = NULL, end_of_word_suffix = NULL, max_token_length = NULL )
vocab_size
The size of the final vocabulary, including all tokens and alphabet.
Default: NULL
.
min_frequency
The minimum frequency a pair should have in order to be merged.
Default: NULL
.
show_progress
Whether to show progress bars while training. Default: TRUE
.
special_tokens
A list of special tokens the model should be aware of.
Default: NULL
.
limit_alphabet
The maximum number of different characters to keep in the alphabet.
Default: NULL
.
initial_alphabet
A list of characters to include in the initial alphabet,
even if not seen in the training dataset. Default: NULL
.
continuing_subword_prefix
A prefix to be used for every subword that is not a beginning-of-word.
Default: NULL
.
end_of_word_suffix
A suffix to be used for every subword that is an end-of-word.
Default: NULL
.
max_token_length
Prevents creating tokens longer than the specified size.
Default: NULL
.
clone()
The objects of this class are cloneable with this method.
trainer_bpe$clone(deep = FALSE)
deep
Whether to make a deep clone.
Other trainer:
tok_trainer
,
trainer_unigram
,
trainer_wordpiece
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.