Man pages for piecemaker
Tools for Preparing Text for Tokenizers

dot-make_unicode_block_regexMake Regex for Unicode Blocks
dot-space_regex_selectorSpace Text by a Regex Selector
piecemaker-packagepiecemaker: Tools for Preparing Text for Tokenizers
prepare_and_tokenizeSplit Text on Spaces
prepare_textPrepare Text for Tokenization
remove_control_charactersRemove Non-Character Characters
remove_diacriticsRemove Diacritical Marks on Characters
remove_replacement_charactersRemove the Unicode Replacement Character
space_cjkAdd Spaces Around CJK Ideographs
space_punctuationAdd Spaces Around Punctuation
squish_whitespaceRemove Extra Whitespace
tokenize_spaceBreak Text at Spaces
validate_utf8Clean Up Text to UTF-8
piecemaker documentation built on June 7, 2023, 5:55 p.m.