moranajp_all | R Documentation |
Using 'MeCab' for morphological analysis. Keep other colnames in dataframe.
moranajp_all(
tbl,
bin_dir = "",
method = "mecab",
text_col = "text",
option = "",
iconv = "",
col_lang = "jp"
)
moranajp(tbl, bin_dir, method, text_col, option = "", iconv = "", col_lang)
remove_linebreaks(tbl, text_col)
separate_cols_ginza(tbl, col_lang)
make_input(tbl, text_col, iconv, brk = "BPMJP ")
make_cmd(method, bin_dir, option = "")
make_cmd_mecab(option = "")
out_cols_mecab(col_lang = "jp")
out_cols_ginza(col_lang = "jp")
out_cols_sudachi(col_lang = "jp")
out_cols_jp()
out_cols_en()
out_cols()
mecab_all(tbl, text_col = "text", bin_dir = "")
mecab(tbl, bin_dir)
tbl |
A tibble or data.frame. |
bin_dir |
A text. Directory of mecab. |
method |
A text. Method to use: "mecab", "ginza", "sudachi_a", "sudachi_b", "sudachi_c", or "chamame". "a", "b" and "c" specify the mode of splitting. "a" split shortest, "b" middle and "c" longest. See https://github.com/WorksApplications/Sudachi for detail. "chamame" use https://chamame.ninjal.ac.jp/ and rvest. |
text_col |
A text. Colnames for morphological analysis. |
option |
A text. Options for mecab. "-b" option is already set by moranajp. To see option, use "mecab -h" in command (win) or terminal (Mac). |
iconv |
A text. Convert encoding of MeCab output. Default (""): don't convert. "CP932_UTF-8": iconv(output, from = "Shift-JIS" to = "UTF-8") "EUC_UTF-8" : iconv(output, from = "eucjp", to = "UTF-8") iconv is also used to convert input text before running MeCab. "CP932_UTF-8": iconv(input, from = "UTF-8", to = "Shift-JIS") |
col_lang |
A text. "jp" or "en" |
brk |
A string of break point |
A tibble. Output of morphological analysis and added column "text_id".
A string
A string
A string
A character vector
A character vector
A character vector
A character vector
A character vector
A data.frame
# sample data of Japanese sentences
data(neko)
neko <-
neko |>
unescape_utf()
# chamame
neko |>
moranajp_all(method = "chamame") |>
print(n=100)
## Not run:
# Need to install 'mecab', 'ginza', or 'sudachi' in local PC
# mecab
bin_dir <- "d:/pf/mecab/bin"
iconv <- "CP932_UTF-8"
neko |>
moranajp_all(text_col = "text", bin_dir = bin_dir, iconv = iconv) |>
print(n=100)
# ginza
neko |>
moranajp_all(text_col = "text", method = "ginza") |>
print(n=100)
# sudachi
bin_dir <- "d:/pf/sudachi"
iconv <- "CP932_UTF-8"
neko |>
moranajp_all(text_col = "text", bin_dir = bin_dir,
method = "sudachi_a", iconv = iconv) |>
print(n=100)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.