clean_up: Clean up result of morphological analyzed data frame
In matutosi/moranajp: Morphological Analysis for Japanese

clean_up

R Documentation

Clean up result of morphological analyzed data frame

Description

Clean up result of morphological analyzed data frame

Usage

clean_up(df, add_depend = FALSE, ...)

pos_filter(df)

add_depend_ginza(df)

delete_stop_words(df, use_common_data = TRUE, add_stop_words = NULL, ...)

replace_words(
  df,
  synonym_df = tibble::tibble(),
  synonym_from = "",
  synonym_to = "",
  ...
)

term_lemma(df)

term_pos_0(df)

term_pos_1(df)

Arguments

`df`	A dataframe including result of morphological analysis.
`add_depend`	A logical. Available for ginza
`...`	Extra arguments to internal functions.
`use_common_data`	A logical. TRUE: use data(stop_words).
`add_stop_words`	A string vector adding into stop words. When use_common_data is TRUE and add_stop_words are given, both of them will be used as stop_words.
`synonym_df`	A data.frame including synonym word pairs. The first column: replace from, the second: replace to.
`synonym_from`, `synonym_to`	A string vector. Length of synonym_from and synonym_to should be the same. When synonym_df and synonym pairs (synonym_from and synonym_to) are given, both of them will be used as synonym.

Value

A data.frame.

Examples

data(neko_mecab)
data(neko_ginza)
data(review_sudachi_c)
data(synonym)
synonym <- 
  synonym |> unescape_utf()

neko_mecab <- 
  neko_mecab |>
  unescape_utf() |>
  print()

neko_mecab |>
  clean_up(use_common_data = TRUE, synonym_df = synonym)

review_ginza |>
  unescape_utf() |>
  add_sentence_no() |>
  clean_up(add_depend = TRUE, use_common_data = TRUE, synonym_df = synonym)

review_sudachi_c |>
  unescape_utf() |>
  add_sentence_no() |>
  clean_up(use_common_data = TRUE, synonym_df = synonym)

matutosi/moranajp documentation built on July 31, 2024, midnight