Man pages for tok
Fast Text Tokenization

decoder_byte_levelByte level decoder
encodingEncoding
model_bpeBPE model
model_unigramAn implementation of the Unigram algorithm
model_wordpieceAn implementation of the WordPiece algorithm
normalizer_nfcNFC normalizer
normalizer_nfkcNFKC normalizer
pre_tokenizerGeneric class for tokenizers
pre_tokenizer_byte_levelByte level pre tokenizer
pre_tokenizer_whitespaceThis pre-tokenizer simply splits using the following regex:...
processor_byte_levelByte Level post processor
tok_decoderGeneric class for decoders
tokenizerTokenizer
tok_modelGeneric class for tokenization models
tok_normalizerGeneric class for normalizers
tok-packagetok: Fast Text Tokenization
tok_processorGeneric class for processors
tok_trainerGeneric training class
trainer_bpeBPE trainer
trainer_unigramUnigram tokenizer trainer
trainer_wordpieceWordPiece tokenizer trainer
tok documentation built on Sept. 11, 2024, 5:21 p.m.