README.md

rjavacmecab

GitHub last
commit Lifecycle:
superseded R-CMD-check Codecov test
coverage

rJava Interface to CMeCab

rjavacmecab is an rJava interface to takscape/cmecab-java that is a Java binding for MeCab.

The goal of this package is to provide the simplest way to help use ‘MeCab’ from R than alternatives (RMeCab and RcppMeCab).

rjavacmecab is yet slower, but it should be easier to use because…

  1. There is no need to build from C/C++ source.
  2. It returns all features of each nodes accessible via cmecab-java.

System Requirements

rjavacmecab requires ‘MeCab’ (mecab, libmecab-dev and mecab-ipadic-utf8) and JDK. Please note that they are installed and available before you use rjavacmecab.

In case using base R and JDK for 32/64bit under Windows, you need 32/64bit build of libmecab.

Usage

Installation

remotes::install_github("paithiov909/rjavacmecab")

Call Tagger

To make cmecab tagger available, rebuild_tagger at first.

rjavacmecab::rebuild_tagger()

res <- rjavacmecab::cmecab(c("長期的自己実現で福楽は得られない", "幸せは刹那の中にあり"))
str(res)
#> tibble [18 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ doc_id : int [1:18] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ token  : chr [1:18] "長期" "的" "自己" "実現" ...
#>  $ feature: chr [1:18] "名詞,一般,*,*,*,*,長期,チョウキ,チョーキ" "名詞,接尾,形容動詞語幹,*,*,*,的,テキ,テキ" "名詞,一般,*,*,*,*,自己,ジコ,ジコ" "名詞,サ変接続,*,*,*,*,実現,ジツゲン,ジツゲン" ...

Prettify Output

res <- rjavacmecab::prettify(res)
str(res)
#> tibble [18 × 11] (S3: tbl_df/tbl/data.frame)
#>  $ doc_id     : int [1:18] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ token      : chr [1:18] "長期" "的" "自己" "実現" ...
#>  $ POS1       : chr [1:18] "名詞" "名詞" "名詞" "名詞" ...
#>  $ POS2       : chr [1:18] "一般" "接尾" "一般" "サ変接続" ...
#>  $ POS3       : chr [1:18] NA "形容動詞語幹" NA NA ...
#>  $ POS4       : chr [1:18] NA NA NA NA ...
#>  $ X5StageUse1: chr [1:18] NA NA NA NA ...
#>  $ X5StageUse2: chr [1:18] NA NA NA NA ...
#>  $ Original   : chr [1:18] "長期" "的" "自己" "実現" ...
#>  $ Yomi1      : chr [1:18] "チョウキ" "テキ" "ジコ" "ジツゲン" ...
#>  $ Yomi2      : chr [1:18] "チョーキ" "テキ" "ジコ" "ジツゲン" ...

If you use IPA-styled dictionary, the output has these columns.

Pack Output

res <- rjavacmecab::pack(res)
print(res)
#>   doc_id                                       text
#> 1      1 長期 的 自己 実現 で 福 楽 は 得 られ ない
#> 2      2                 幸せ は 刹那 の 中 に あり

Use Igo

Igo is a pure Java port of MeCab. rjavacmecab also provides a wrapper function of that.

res <- rjavacmecab::igo("お前がそう思うんならそうなんだろう、お前ん中ではな")
str(res)
#> tibble [18 × 3] (S3: tbl_df/tbl/data.frame)
#>  $ doc_id : int [1:18] 1 1 1 1 1 1 1 1 1 1 ...
#>  $ token  : chr [1:18] "お前" "が" "そう" "思う" ...
#>  $ feature: chr [1:18] "名詞,代名詞,一般,*,*,*,お前,オマエ,オマエ" "助詞,格助詞,一般,*,*,*,が,ガ,ガ" "副詞,助詞類接続,*,*,*,*,そう,ソウ,ソー" "動詞,自立,*,*,五段・ワ行促音便,基本形,思う,オモウ,オモウ" ...

License

BSD 3-clause License.

This software includes works that are distributed in Public Domain and New BSD License. See https://github.com/takscape/cmecab-java/blob/master/README.txt for more details.

Icons made by Vectors Market from Flaticon.



paithiov909/rjavacmecab documentation built on Feb. 1, 2023, 4 a.m.