knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of moranajp is a tool of morphological analysis for Japanese.
moranajpは,日本語形態素解析をするためのものです.
You can install the released version of moranajp from [GitHub] ( https://github.com/matutosi/moranajp ). You need install MeCab ( https://taku910.github.io/mecab/ ).
最新バージョンは,[GitHub] ( https://github.com/matutosi/moranajp ) でダウンロードできます. MeCab ( https://taku910.github.io/mecab/ ) を別途インストールする必要があります.
# CRAN install.packages("moranajp") # development # install.packages("remotes") remotes::install_github("matutosi/moranajp")
library(moranajp) data(neko) neko <- neko |> dplyr::mutate(text = stringi::stri_unescape_unicode(text)) |> tibble::rownames_to_column("cols") neko # First part of 'I Am a Cat' by Soseki Natsume # MeCab (Need install MeCab) # MeCabをインストールする必要あり bin_dir <- "d:/pf/mecab/bin" # set your environment MeCabをインストールしたフォルダを指定 # bin_dir <- "/opt/local/mecab/bin/" # Example for Mac or Linux iconv <- "CP932_UTF-8" # maybe need in Windows Windowsで必要な場合あり # 文字化けする場合は,引数 iconv を使ってください. # `iconv = "CP932_UTF-8"` or `iconv = "EUC_UTF-8"` neko |> moranajp_all(text_col = "text", bin_dir = bin_dir, iconv = iconv) |> print(n=30) # chamame (Do not need install, but use web service) # 別途ツールのインストールのなし(Web茶まめを使用) neko |> head(3) |> moranajp_all(method = "chamame", text_col = "text") |> print(n=30)
library(moranajp) data(synonym) synonym <- unescape_utf(synonym) data(neko_mecab) neko_mecab <- neko_mecab |> unescape_utf() |> add_sentence_no() |> clean_up(use_common_data = TRUE, synonym_df = synonym) bigram_neko <- neko_mecab |> draw_bigram_network() add_stop_words <- c("\\u3042\\u308b", "\\u3059\\u308b", "\\u3066\\u308b", "\\u3044\\u308b","\\u306e", "\\u306a\\u308b", "\\u304a\\u308b", "\\u3093", "\\u308c\\u308b", "*") |> unescape_utf() data(review_chamame) bigram_review <- review_chamame |> dplyr::slice(1:2000) |> unescape_utf() |> add_sentence_no() |> clean_up(add_stop_words = add_stop_words) |> draw_bigram_network() data(review_ginza) bigram_review_ginza <- review_ginza |> unescape_utf() |> add_sentence_no() |> clean_up(add_depend = TRUE) |> draw_bigram_network(depend = TRUE)
Line breaks in the text will be removed to avoid lag text id. If you want to remain line breaks, please change them into other character.
文字列内の改行コード(\r\n, \n)は,削除されます(改行コードでずれるのを防ぐため). 改行コードに意味がある場合は,事前に改行コードを別の文字列に変更するなどの対応をしてください.
Toshikazu Matsumura (2021) Morphological analysis for Japanese with R. https://github.com/matutosi/moranajp/.
松村 俊和 (2021) Rによる日本語形態素解析. https://github.com/matutosi/moranajp/.
download file (mecab-0.996.tar.gz, mecab-ipadic-2.7.0-20070801.tar.gz)
ファイルのダウンロード(mecab-0.996.tar.gz, mecab-ipadic-2.7.0-20070801.tar.gz)
http://taku910.github.io/mecab/#download
tar xvf mecab-0.996.tar.gz cd mecab-0.996 ./configure --enable-utf8-only --prefix=/opt/local/mecab make sudo make install # install directory # 辞書のインストール tar xvf mecab-ipadic-2.7.0-20070801.tar.gz cd mecab-ipadic-2.7.0-20070801 ./configure --with-mecab-config=/opt/local/mecab/bin/mecab-config --with-charset=utf8 --prefix=/opt/local/mecab make sudo make install # add path # パスの追加 echo 'export PATH=/opt/local/mecab/bin:$PATH' >> ~/.bash_profile source ~/.bash_profile # run mecab # mecabの実行 mecab
ref (in Japanese) 参考 https://qiita.com/nkjm/items/913584c00af199794257
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.