| pre_tokenizer_byte_level | R Documentation |
Byte level pre tokenizer
Byte level pre tokenizer
This pre-tokenizer takes care of replacing all bytes of the given string with a corresponding representation, as well as splitting into words.
tok::tok_pre_tokenizer -> tok_pre_tokenizer_whitespace
new()Initializes the bytelevel tokenizer
pre_tokenizer_byte_level$new(add_prefix_space = TRUE, use_regex = TRUE)
add_prefix_spaceWhether to add a space to the first word
use_regexSet this to False to prevent this pre_tokenizer from using the GPT2 specific regexp for spliting on whitespace.
clone()The objects of this class are cloneable with this method.
pre_tokenizer_byte_level$clone(deep = FALSE)
deepWhether to make a deep clone.
Other pre_tokenizer:
pre_tokenizer,
pre_tokenizer_whitespace
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.