token: Morpheme tokenizer based on mecab-ko

Description Usage Arguments Value Examples

Description

These tokernizer functions perform tokenization into full or selected morphemes, nouns.

Usage

1
2
3
4
5
token_morph(phrase, strip_punct = FALSE, strip_numeric = FALSE)

token_words(phrase, strip_punct = FALSE, strip_numeric = FALSE)

token_nouns(phrase, strip_punct = FALSE, strip_numeric = FALSE)

Arguments

phrase

A character vector or a list of character vectors to be tokenized into morphemes. If phrase is a charactor vector, it can be of any length, and each element will be tokenized separately. If phrase is a list of charactor vectors, each element of the list should be a one-item vector.

strip_punct

Bool. If you want to remove punctuations in the phrase, set this as TRUE.

strip_numeric

Bool. If you want to remove numbers in the phrase, set this as TRUE.

Value

A list of character vectors containing the tokens, with one element in the list.

See examples in Github.

Examples

1
2
3
4
5
6
7
8
## Not run: 
txt <- # Some Korean sentence

token_morph(txt)
token_words(txt, strip_punct = FALSE)
token_nouns(txt, strip_numeric = TRUE)

## End(Not run)

RmecabKo documentation built on Feb. 13, 2018, 5:02 p.m.