knitr::opts_chunk$set( collapse = TRUE, comment = "#>", tidy = "styler", fig.path = "man/figures/README-", out.width = "100%" ) stopifnot( require(RcppMeCab), require(RcppKagome) ) ## Initial loads of dynamic libraries --- pos("test load") posParallel("test load") kagome("test load")
RcppKagome is an R interface to ikawaha/kagome; Self-contained Japanese morphological analyzer written in pure Go.
remotes::install_github( "paithiov909/RcppKagome" #, INSTALL_opts = "--no-multiarch" # for windows user )
Note that the installation RcppKagome from source package requires ikawaha/kagome (v2 or later).
By default, the package uses a static library generated with Cgo, which contains the Japanese IPA dictionary.
However, you can also specify another dictionary to be bundled before build and install the package. If you would like to use another one, please set RCPPKAGOME_DIC
as an evironment variable.
Sys.setenv(RCPPKAGOME_DIC = "uni") # for using uni-dic # Or Sys.setenv(RCPPKAGOME_DIC = "ko") # for using mecab-ko-dic
res <- RcppKagome::kagome("雨にも負けず 風にも負けず") str(res)
res <- RcppKagome::kagome( c("陽が照って鳥が啼き あちこちの楢の林も、けむるとき", "ぎちぎちと鳴る 汚い掌を、おれはこれからもつことになる")) res <- RcppKagome::prettify(res) str(res)
In case using IPA dictionary, prettified outputs have these columns.
Here uses whole text of 'Wagahai Wa Neko Dearu' written by Natsume Souseki. The text is originally from Aozora Bunko.
sentences <- readLines("inst/NekoText.gz", encoding = "UTF-8") dplyr::glimpse(sentences)
tm <- microbenchmark::microbenchmark( pos = RcppMeCab::pos(sentences[30]), posParallel = RcppMeCab::posParallel(sentences[30]), kagome = RcppKagome::kagome(sentences[30]), times = 500L ) summary(tm)
ggplot2::autoplot(tm)
tm <- microbenchmark::microbenchmark( pos = RcppMeCab::pos(sentences), posParallel = RcppMeCab::posParallel(sentences), kagome = RcppKagome::kagome(sentences), times = 10L ) summary(tm)
ggplot2::autoplot(tm)
MIT license.
Icons made by Freepik from www.flaticon.com.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.