The default of unigram to 7-gram tokenization was chosen because 99 placenames in the geonames gazetteer are <= 7 words long.
1 | gn_tokenizer(text, min = 1, max = 7)
|
text |
A character vector to be tokenized |
min |
The minimum ngram length to create (default = 1) |
max |
The maximum ngram length to create (default = 7) |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.