gn_tokenizer: Tokenize word ngrams (1-7 default) of text

Description Usage Arguments

Description

The default of unigram to 7-gram tokenization was chosen because 99 placenames in the geonames gazetteer are <= 7 words long.

Usage

1
gn_tokenizer(text, min = 1, max = 7)

Arguments

text

A character vector to be tokenized

min

The minimum ngram length to create (default = 1)

max

The maximum ngram length to create (default = 7)


jacob-ogre/us.geonames documentation built on May 20, 2019, 6:03 p.m.