str_normalize: Apply regular-expression-based text normalization to strings

str_normalizeR Documentation

Apply regular-expression-based text normalization to strings

Description

Applies a set of regular-expression-based text normalization rules to one or more strings. All performed replacements are displayed on the console by default (verbose = TRUE).

Usage

str_normalize(
  string,
  rules = yay::regex_text_normalization,
  n_context_chrs = 20L,
  verbose = TRUE
)

Arguments

string

Input vector. Either a character vector, or something coercible to one.

rules

A data frame of regular expression patterns and replacements. pattern can optionally be a list column condensing multiple patterns to the same replacement rule. Patterns are interpreted as regular expressions as described in stringi::stringi-search-regex(). Replacements are interpreted as-is, except that references of the form ⁠\1⁠, ⁠\2⁠, etc. will be replaced with the contents of the respective matched group (created in patterns using ⁠()⁠). Pattern-replacement pairs are processed in the order given, meaning that first listed pairs are applied before later listed ones.

n_context_chrs

The (maximum) number of characters displayed around the actual string and its replacement. The number refers to a single side of string/replacement, so the total number of context characters is at the maximum 2 * n_context_chrs. Only relevant if verbose = TRUE.

verbose

Whether or not to display replacements on the console.

Value

path invisibly.

See Also

Regular expression rules: regex_text_normalization regex_file_normalization

Other string functions: str_normalize_file(), str_replace_file(), str_replace_verbose()

Examples

"This kind of “text normalization” is e.g. useful to apply before feeding stuff to ‘Pandoc’" |>
  yay::str_normalize()

salim-b/yay documentation built on Jan. 3, 2025, 6:16 p.m.