normalize_state: Normalize CSI and OSC Sequences

View source: R/normalize.R

normalize_stateR Documentation

Normalize CSI and OSC Sequences

Description

Re-encodes SGR and OSC encoded URL sequences into a unique decomposed form. Strings containing semantically identical SGR and OSC sequences that are encoded differently should compare equal after normalization.

Usage

normalize_state(
  x,
  warn = getOption("fansi.warn", TRUE),
  term.cap = getOption("fansi.term.cap", dflt_term_cap()),
  carry = getOption("fansi.carry", FALSE)
)

Arguments

x

a character vector or object that can be coerced to such.

warn

TRUE (default) or FALSE, whether to warn when potentially problematic Control Sequences are encountered. These could cause the assumptions fansi makes about how strings are rendered on your display to be incorrect, for example by moving the cursor (see ?fansi). At most one warning will be issued per element in each input vector. Will also warn about some badly encoded UTF-8 strings, but a lack of UTF-8 warnings is not a guarantee of correct encoding (use validUTF8 for that).

term.cap

character a vector of the capabilities of the terminal, can be any combination of "bright" (SGR codes 90-97, 100-107), "256" (SGR codes starting with "38;5" or "48;5"), "truecolor" (SGR codes starting with "38;2" or "48;2"), and "all". "all" behaves as it does for the ctl parameter: "all" combined with any other value means all terminal capabilities except that one. fansi will warn if it encounters SGR codes that exceed the terminal capabilities specified (see term_cap_test for details). In versions prior to 1.0, fansi would also skip exceeding SGRs entirely instead of interpreting them. You may add the string "old" to any otherwise valid term.cap spec to restore the pre 1.0 behavior. "old" will not interact with "all" the way other valid values for this parameter do.

carry

TRUE, FALSE (default), or a scalar string, controls whether to interpret the character vector as a "single document" (TRUE or string) or as independent elements (FALSE). In "single document" mode, active state at the end of an input element is considered active at the beginning of the next vector element, simulating what happens with a document with active state at the end of a line. If FALSE each vector element is interpreted as if there were no active state when it begins. If character, then the active state at the end of the carry string is carried into the first element of x (see "Replacement Functions" for differences there). The carried state is injected in the interstice between an imaginary zeroeth character and the first character of a vector element. See the "Position Semantics" section of substr_ctl and the "State Interactions" section of ?fansi for details. Except for strwrap_ctl where NA is treated as the string "NA", carry will cause NAs in inputs to propagate through the remaining vector elements.

Details

Each compound SGR sequence is broken up into individual tokens, superfluous tokens are removed, and the SGR reset sequence "ESC[0m" (or "ESC[m") is replaced by the closing codes for whatever SGR styles are active at the point in the string in which it appears.

Unrecognized SGR codes will be dropped from the output with a warning. The specific order of SGR codes associated with any given SGR sequence is not guaranteed to remain the same across different versions of fansi, but should remain unchanged except for the addition of previously uninterpreted codes to the list of interpretable codes. There is no special significance to the order the SGR codes are emitted in other than it should be consistent for any given SGR state. URLs adjacent to SGR codes are always emitted after the SGR codes irrespective of what side they were on originally.

OSC encoded URL sequences are always terminated by "ESC]\", and those between abutting URLs are omitted. Identical abutting URLs are merged. In order for URLs to be considered identical both the URL and the "id" parameter must be specified and be the same. OSC URL parameters other than "id" are dropped with a warning.

The underlying assumption is that each element in the vector is unaffected by SGR or OSC URLs in any other element or elsewhere. This may lead to surprising outcomes if these assumptions are untrue (see examples). You may adjust this assumption with the carry parameter.

Normalization was implemented primarily for better compatibility with crayon which emits SGR codes individually and assumes that each opening code is paired up with its specific closing code, but it can also be used to reduce the probability that strings processed with future versions of fansi will produce different results than the current version.

Value

x, with all SGRs normalized.

See Also

?fansi for details on how Control Sequences are interpreted, particularly if you are getting unexpected results, unhandled_ctl for detecting bad control sequences.

Examples

normalize_state("hello\033[42;33m world")
normalize_state("hello\033[42;33m world\033[m")
normalize_state("\033[4mhello\033[42;33m world\033[m")

## Superflous codes removed
normalize_state("\033[31;32mhello\033[m")      # only last color prevails
normalize_state("\033[31\033[32mhello\033[m")  # only last color prevails
normalize_state("\033[31mhe\033[49mllo\033[m") # unused closing

## Equivalent normalized sequences compare identical
identical(
  normalize_state("\033[31;32mhello\033[m"),
  normalize_state("\033[31mhe\033[49mllo\033[m")
)
## External SGR will defeat normalization, unless we `carry` it
red <- "\033[41m"
writeLines(
  c(
    paste(red, "he\033[0mllo", "\033[0m"),
    paste(red, normalize_state("he\033[0mllo"), "\033[0m"),
    paste(red, normalize_state("he\033[0mllo", carry=red), "\033[0m")
) )

fansi documentation built on Oct. 9, 2023, 1:07 a.m.