Kiwi: Kiwi Class

KiwiR Documentation

Kiwi Class

Description

Kiwi class is provide method for korean mophological analyze result.

Methods

Public methods


Method print()

print method for Kiwi objects

Usage
Kiwi$print(x, ...)
Arguments
x

self

...

ignored


Method new()

Create a kiwi instance.

Usage
Kiwi$new(
  num_workers = 0,
  model_size = "base",
  integrate_allomorph = TRUE,
  load_default_dict = TRUE
)
Arguments
num_workers

int(optional): use multi-thread core number. default is 0 which means use all core.

model_size

char(optional): kiwi model select. default is "base". "small", "large" is available.

integrate_allomorph

bool(optional): default is TRUE.

load_default_dict

bool(optional): use defualt dictionary. default is TRUE.


Method add_user_word()

add user word with pos and score

Usage
Kiwi$add_user_word(word, tag, score, orig_word = "")
Arguments
word

char(required): target word to add.

tag

Tags(required): tag information about word.

score

num(required): score information about word.

orig_word

char(optional): origin word.


Method add_pre_analyzed_words()

TODO

Usage
Kiwi$add_pre_analyzed_words(form, analyzed, score)
Arguments
form

char(required): target word to add analyzed result.

analyzed

data.frame(required): analyzed result expected.

score

num(required): score information about pre analyzed result.


Method add_rules()

TODO

Usage
Kiwi$add_rules(tag, pattern, replacement, score)
Arguments
tag

Tags(required): target tag to add rules.

pattern

char(required): regular expression.

replacement

char(required): replace text.

score

num(required): score information about rules.


Method load_user_dictionarys()

add user dictionary using text file.

Usage
Kiwi$load_user_dictionarys(user_dict_path)
Arguments
user_dict_path

char(required): path of user dictionary file.


Method extract_words()

Extract Noun word candidate from texts.

Usage
Kiwi$extract_words(
  input,
  min_cnt,
  max_word_len,
  min_score,
  pos_threshold,
  apply = FALSE
)
Arguments
input

char(required): target text data

min_cnt

int(required): minimum count of word in text.

max_word_len

int(required): max word length.

min_score

num(required): minimum score.

pos_threshold

num(required): pos threashold.

apply

bool(optional): apply extracted word as user word dict.


Method analyze()

Analyze text to token and tag results.

Usage
Kiwi$analyze(text, top_n = 3, match_option = Match$ALL, stopwords = FALSE)
Arguments
text

char(required): target text.

top_n

int(optional): number of result. Default is 3.

match_option

match_option Match: use Match. Default is Match$ALL

stopwords

stopwords option. Default is FALSE which is use nothing. If TRUE, use embaded stopwords dictionany. If char: path of dictionary txt file, use file. If Stopwords class, use it. If not valid value, work same as FALSE.

Returns

list of result.


Method tokenize()

Analyze text to token and pos result just top 1.

Usage
Kiwi$tokenize(
  text,
  match_option = Match$ALL,
  stopwords = FALSE,
  form = "tibble"
)
Arguments
text

char(required): target text.

match_option

match_option Match: use Match. Default is Match$ALL

stopwords

stopwords option. Default is FALSE which is use nothing. If TRUE, use embaded stopwords dictionany. If char: path of dictionary txt file, use file. If Stopwords class, use it. If not valid value, work same as FALSE.

form

char(optional): return form. default is "tibble". "list", "tidytext" is available.


Method split_into_sents()

Some text may not split sentence by sentence. split_into_sents works split sentences to sentence by sentence.

Usage
Kiwi$split_into_sents(text, match_option = Match$ALL, return_tokens = FALSE)
Arguments
text

char(required): target text.

match_option

match_option Match: use Match. Default is Match$ALL

return_tokens

bool(optional): add tokenized resault.


Method get_tidytext_func()

set function to tidytext unnest_tokens.

Usage
Kiwi$get_tidytext_func(match_option = Match$ALL, stopwords = FALSE)
Arguments
match_option

match_option Match: use Match. Default is Match$ALL

stopwords

stopwords option. Default is TRUE which is to use embaded stopwords dictionary. If FALSE, use not embaded stopwords dictionary. If char: path of dictionary txt file, use file. If Stopwords class, use it. If not valid value, work same as FALSE.

Returns

function

Examples
\dontrun{
   kw <- Kiwi$new()
   tidytoken <- kw$get_tidytext_func()
   tidytoken("test")
}

Method clone()

The objects of this class are cloneable with this method.

Usage
Kiwi$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Examples

## Not run: 
  kw <- Kiwi$new()
  kw$analyze("test")
  kw$tokenize("test")
  
## End(Not run)

## ------------------------------------------------
## Method `Kiwi$get_tidytext_func`
## ------------------------------------------------

## Not run: 
   kw <- Kiwi$new()
   tidytoken <- kw$get_tidytext_func()
   tidytoken("test")

## End(Not run)

elbird documentation built on Aug. 12, 2022, 5:08 p.m.