Man pages for lmullen/tokenizers
Fast, Consistent Tokenization of Natural Language Text

basic-tokenizersBasic tokenizers
chunk_textChunk text into smaller segments
mobydickThe text of Moby Dick
ngram-tokenizersN-gram tokenizers
ptb-tokenizerPenn Treebank Tokenizer
shingle-tokenizersCharacter shingle tokenizers
stem-tokenizersWord stem tokenizer
tokenizersTokenizers
word-countingCount words, sentences, characters
lmullen/tokenizers documentation built on Oct. 26, 2018, 1:34 a.m.