Apply regular-expression-based text normalization to files


Applies a set of regular-expression-based text normalization rules to one or more files. By default, changes are shown on the console only, without actually modifying any files. Set run_dry = FALSE to apply the changes.


  rules = yay::regex_text_normalization,
  run_dry = TRUE,
  process_line_by_line = FALSE,
  n_context_chrs = 20L,
  verbose = TRUE



Paths to the text files. A character vector.


A tibble of regular expression patterns and replacements. It must have the columns pattern and replacement. pattern can optionally be a list column condensing multiple patterns to the same replacement rule. Patterns are interpreted as regular expressions as described in stringi::stringi-search-regex(). Replacements are interpreted as-is, except that references of the form ⁠\1⁠, ⁠\2⁠, etc. will be replaced with the contents of the respective matched group (created in patterns using ⁠()⁠). Pattern-replacement pairs are processed in the order given, meaning that first listed pairs are applied before later listed ones.


Whether or not to show replacements on the console only, without actually modifying any files. Implies verbose = TRUE.


Whether each line in a file should be treated as a separate string or the whole file as one single string. While the latter is more performant, you probably want the former if you're using "^" or "$" in your patterns.


The (maximum) number of characters displayed around the actual string and its replacement. The number refers to a single side of string/replacement, so the total number of context characters is at the maximum 2 * n_context_chrs. Only relevant if verbose = TRUE.


Whether or not to display replacements on the console.


path invisibly.

See Also

Regular expression rules: regex_text_normalization regex_file_normalization

Other string functions: str_normalize(), str_replace_file(), str_replace_verbose()


# Use POSIX-related file normalization rule(s) included in this package
temp_file <- tempfile()
download.file(url = paste0("",
              destfile = temp_file,
              quiet = TRUE,
              mode = "wb")

yay::regex_file_normalization |>
  dplyr::filter(category == "posix") |>
  yay::str_normalize_file(path = temp_file)

