make_groups | R Documentation |
Using 'MeCab' for morphological analysis. Keep other colnames in dataframe.
make_groups(
tbl,
text_col = "text",
length = 8000,
tmp_group = "tmp_group",
str_length = "str_length"
)
make_groups_sub(tbl, text_col, n_group, tmp_group, str_length)
max_sum_str_length(tbl, tmp_group, str_length)
tbl |
A tibble or data.frame. |
text_col |
A text. Colnames for morphological analysis. |
length |
A numeric. |
tmp_group , str_length |
A string to use temporary. |
n_group |
A numeric. |
A tibble. Output of morphological analysis and added column "text_id".
A string
A string
A string
A character vector
A character vector
A character vector
A character vector
A character vector
A data.frame
# sample data of Japanese sentences
data(neko)
neko <-
neko |>
unescape_utf()
# chamame
neko |>
moranajp_all(method = "chamame") |>
print(n=100)
## Not run:
# Need to install 'mecab', 'ginza', or 'sudachi' in local PC
# mecab
bin_dir <- "d:/pf/mecab/bin"
iconv <- "CP932_UTF-8"
neko |>
moranajp_all(text_col = "text", bin_dir = bin_dir, iconv = iconv) |>
print(n=100)
# ginza
neko |>
moranajp_all(text_col = "text", method = "ginza") |>
print(n=100)
# sudachi
bin_dir <- "d:/pf/sudachi"
iconv <- "CP932_UTF-8"
neko |>
moranajp_all(text_col = "text", bin_dir = bin_dir,
method = "sudachi_a", iconv = iconv) |>
print(n=100)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.