Description Usage Arguments Value Examples
This function tokenizes inputs into n-grams. For the developmental purpose, this function offers
basic n-gram (or shingle n-gram) only. Other n-gram functionality will be added later. Punctuations
and numerics are stripped for this tokenizer, because in Korean n-grams those are usually useless.
N-gram function is based on the selective morpheme tokenizer (token_words), but you can
select other tokenizer as well.
| 1 2 | token_ngrams(phrase, n = 3L, div = c("morph", "words", "nouns"),
  stopwords = character(), ngram_delim = " ")
 | 
| phrase | A character vector or a list of character vectors to be tokenized into morphemes.
If  | 
| n | The number of words in the n-gram. This must be an integer greater than or equal to 1. | 
| div | The token generator definition. The options are "morph", "words", and "nouns". | 
| stopwords | Stopwords set to exclude tokens. | 
| ngram_delim | The separator between words in an n-gram. | 
A list of character vectors containing the tokens, with one element in the list.
See examples in Github.
| 1 2 3 4 5 6 7 | ## Not run: 
txt <- # Some Korean sentence
token_ngrams(txt)
token_ngrams(txt, n = 2)
## End(Not run)
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.