token_ngrams: N-gram tokenizer based on mecab-ko
In RmecabKo: An 'Rcpp' Interface for Eunjeon Project

Description Usage Arguments Value Examples

This function tokenizes inputs into n-grams. For the developmental purpose, this function offers basic n-gram (or shingle n-gram) only. Other n-gram functionality will be added later. Punctuations and numerics are stripped for this tokenizer, because in Korean n-grams those are usually useless. N-gram function is based on the selective morpheme tokenizer (token_words), but you can select other tokenizer as well.

1 2	token_ngrams(phrase, n = 3L, div = c("morph", "words", "nouns"), stopwords = character(), ngram_delim = " ")

`phrase`	A character vector or a list of character vectors to be tokenized into morphemes. If `phrase` is a charactor vector, it can be of any length, and each element will be tokenized separately. If `phrase` is a list of charactor vectors, each element of the list should be a one-item vector.
`n`	The number of words in the n-gram. This must be an integer greater than or equal to 1.
`div`	The token generator definition. The options are "morph", "words", and "nouns".
`stopwords`	Stopwords set to exclude tokens.
`ngram_delim`	The separator between words in an n-gram.

A list of character vectors containing the tokens, with one element in the list.

See examples in Github.

## Not run: 
txt <- # Some Korean sentence

token_ngrams(txt)
token_ngrams(txt, n = 2)

## End(Not run)

RmecabKo documentation built on May 2, 2019, 4:22 a.m.

RmecabKo index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

RmecabKo
An 'Rcpp' Interface for Eunjeon Project

token_ngrams: N-gram tokenizer based on mecab-ko
In RmecabKo: An 'Rcpp' Interface for Eunjeon Project

Description

Usage

Arguments

Value

Examples

Related to token_ngrams in RmecabKo...

R Package Documentation

Browse R Packages

We want your feedback!

RmecabKo An 'Rcpp' Interface for Eunjeon Project

token_ngrams: N-gram tokenizer based on mecab-ko In RmecabKo: An 'Rcpp' Interface for Eunjeon Project

Description

Usage

Arguments

Value

Examples

Related to token_ngrams in RmecabKo...

R Package Documentation

Browse R Packages

We want your feedback!

RmecabKo
An 'Rcpp' Interface for Eunjeon Project

token_ngrams: N-gram tokenizer based on mecab-ko
In RmecabKo: An 'Rcpp' Interface for Eunjeon Project