strsplit_ctl: ANSI Control Sequence Aware Version of strsplit

Description Usage Arguments Details Value _ctl vs. _sgr Note See Also Examples

View source: R/strsplit.R

Description

A drop-in replacement for base::strsplit. It will be noticeably slower, but should otherwise behave the same way except for Control Sequence awareness.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
strsplit_ctl(
  x,
  split,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE,
  warn = getOption("fansi.warn"),
  term.cap = getOption("fansi.term.cap"),
  ctl = "all"
)

strsplit_sgr(
  x,
  split,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE,
  warn = getOption("fansi.warn"),
  term.cap = getOption("fansi.term.cap")
)

Arguments

x

a character vector, or, unlike base::strsplit an object that can be coerced to character.

split

character vector (or object which can be coerced to such) containing regular expression(s) (unless fixed = TRUE) to use for splitting. If empty matches occur, in particular if split has length 0, x is split into single characters. If split has length greater than 1, it is re-cycled along x.

fixed

logical. If TRUE match split exactly, otherwise use regular expressions. Has priority over perl.

perl

logical. Should Perl-compatible regexps be used?

useBytes

logical. If TRUE the matching is done byte-by-byte rather than character-by-character, and inputs with marked encodings are not converted. This is forced (with a warning) if any input is found which is marked as "bytes" (see Encoding).

warn

TRUE (default) or FALSE, whether to warn when potentially problematic Control Sequences are encountered. These could cause the assumptions fansi makes about how strings are rendered on your display to be incorrect, for example by moving the cursor (see fansi).

term.cap

character a vector of the capabilities of the terminal, can be any combination of "bright" (SGR codes 90-97, 100-107), "256" (SGR codes starting with "38;5" or "48;5"), and "truecolor" (SGR codes starting with "38;2" or "48;2"). Changing this parameter changes how fansi interprets escape sequences, so you should ensure that it matches your terminal capabilities. See term_cap_test for details.

ctl

character, which Control Sequences should be treated specially. See the "_ctl vs. _sgr" section for details.

  • "nl": newlines.

  • "c0": all other "C0" control characters (i.e. 0x01-0x1f, 0x7F), except for newlines and the actual ESC (0x1B) character.

  • "sgr": ANSI CSI SGR sequences.

  • "csi": all non-SGR ANSI CSI sequences.

  • "esc": all other escape sequences.

  • "all": all of the above, except when used in combination with any of the above, in which case it means "all but".

Details

This function works by computing the position of the split points after removing Control Sequences, and uses those positions in conjunction with substr_ctl to extract the pieces. This concept is borrowed from crayon::col_strsplit. An important implication of this is that you cannot split by Control Sequences that are being treated as Control Sequences. You can however limit which control sequences are treated specially via the ctl parameters (see examples).

Value

list, see base::strsplit.

_ctl vs. _sgr

The *_ctl versions of the functions treat all Control Sequences specially by default. Special treatment is context dependent, and may include detecting them and/or computing their display/character width as zero. For the SGR subset of the ANSI CSI sequences, fansi will also parse, interpret, and reapply the text styles they encode if needed. You can modify whether a Control Sequence is treated specially with the ctl parameter. You can exclude a type of Control Sequence from special treatment by combining "all" with that type of sequence (e.g. ctl=c("all", "nl") for special treatment of all Control Sequences but newlines). The *_sgr versions only treat ANSI CSI SGR sequences specially, and are equivalent to the *_ctl versions with the ctl parameter set to "sgr".

Note

Non-ASCII strings are converted to and returned in UTF-8 encoding. The split positions are computed after both x and split are converted to UTF-8.

See Also

fansi for details on how Control Sequences are interpreted, particularly if you are getting unexpected results, base::strsplit for details on the splitting.

Examples

1
2
3
4
5
6
strsplit_sgr("\033[31mhello\033[42m world!", " ")

## Next two examples allow splitting by newlines, which
## normally doesn't work as newlines are _Control Sequences_
strsplit_sgr("\033[31mhello\033[42m\nworld!", "\n")
strsplit_ctl("\033[31mhello\033[42m\nworld!", "\n", ctl=c("all", "nl"))

fansi documentation built on May 25, 2021, 9:06 a.m.