Description Usage Arguments Details Value Examples
posParallel
returns part-of-speech (POS) tagged morpheme of the sentence.
1 2 | posParallel(sentence, join = TRUE, format = c("list", "data.frame"),
sys_dic = "", user_dic = "")
|
sentence |
A character vector of any length. For analyzing multiple sentences, put them in one character vector. |
join |
A bool to decide the output format. The default value is TRUE. If FALSE, the function will return morphemes only, and tags put in the attribute. if |
format |
A data type for the result. The default value is "list". You can set this to "data.frame" to get a result as data frame format. |
sys_dic |
A location of system MeCab dictionary. The default value is "". |
user_dic |
A location of user-specific MeCab dictionary. The default value is "". |
This is a parallelized version of MeCab part-of-speech tagger. The function gets a character vector of any length and runs a loop inside C++ with Intel TBB to provide faster processing.
Parallelizing over a character vector is not supported by RcppParallel
.
Thus, this function makes duplicates of the input and the output.
Therefore, if your data volume is large, use pos
or divide the vector to
several sub-vectors.
You can add a user dictionary to user_dic
. It should be compiled by
mecab-dict-index
. You can find an explatation about compiling a user
dictionary in the https://github.com/junhewk/RcppMeCab.
You can also set a system dictionary especially if you are using multiple
dictionaries (for example, using both IPA and Juman dictionary at the same time in Japanese)
in sys_dic
. Using options(mecabSysDic=)
, you can set your
prefered system dictionary to the R terminal.
If you want to get a morpheme only, use join = False
to put tag names on the attribute.
Basically, the function will return a list of character vectors with (morpheme)/(tag) elements.
A string vector of POS tagged morpheme will be returned in conjoined character vecter form. Element name of the list are original phrases
1 2 3 4 5 6 7 8 9 10 | ## Not run:
sentence <- c(#some UTF-8 texts)
posParallel(sentence)
posParallel(sentence, join = FALSE)
posParallel(sentence, format = "data.frame")
posParallel(sentence, user_dic = "~/user_dic.dic")
# System dictionary example: in case of using mecab-ipadic-NEologd
pos(sentence, sys_dic = "/usr/local/lib/mecab/dic/mecab-ipadic-neologd/")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.