The goal of RmecabKo is to parse Korean phrases with
mecab-ko (Eunjeon project, and to provide helper functions to analyze Korean documents. RmecabKo provides R wrapper function of
Rcpp (in Mac OSX and Linux) or wrapper function of binary build of
mecab-ko-msvc by system commands and file I/O (in Windows).
For instructions in Korean, refer to readme.rmd.
mecab-ko from the Bitbucket repository.
You can download a source of
mecab-ko from Download page.
In Mac OSX terminal:
$ tar zxfv mecab-ko-XX.tar.gz $ cd mecab-ko-XX $ ./configure $ make $ make check $ sudo make install
$ tar zxfv mecab-ko-XX.tar.gz $ cd mecab-ko-XX $ ./configure $ make $ make check $ su # make install
After the installation of
mecab-ko, You can install RmecabKo from github with:
install.packages("RmecabKo") # or, install.packages("devtools") devtools::install_github("junhewk/RmecabKo")
You need to install
(In Github version,
install_dic function is added to support this functionality. You can install
install_dic(). I'm working with custom dictionary function, for it
mecab-ko-dic has to be installed by this function.)
Refer to Bitbucket page. The installation procedure is same as
install_mecab function is provided. You need to specify the installation path of the
meccab-ko-dic in the function parameter,
install.packages("RmecabKo") # install.packages("devtools") devtools::install_github("junhewk/RmecabKo") install_mecab("D:/Rlibs/mecab")
Basic usage of the provided functions is to put character vector in
phrase parameter of
nouns(phrase). Loop between phrases are operated in the C++ binary, thus you can analyze many phrases quickly.
pos("Hello. This is R wrapper of Korean morpheme analyzer mecab-ko.")
Output of the
pos is list. Each element of the list contains classified morpheme and inferred part-of-speech (POS), separated by "/". The name of the element is the original phrase.
Output of the
nouns is also list. Each element of the list contains extracted nouns. The name of the element is the original phrase.
tokenizer functions are added. You can use
tokens_ngram. Please refer to the help page of each function.
More examples will be provided on Github wiki.
Junhewk Kim ([email protected])
mecabto Korean version
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.