modifiers: Control matching behaviour with modifier functions
In hadley/stringr: Simple, Consistent Wrappers for Common String Operations

modifiers

R Documentation

Control matching behaviour with modifier functions

Description

Modifier functions control the meaning of the pattern argument to stringr functions:

boundary(): Match boundaries between things.
coll(): Compare strings using standard Unicode collation rules.
fixed(): Compare literal bytes.
regex() (the default): Uses ICU regular expressions.

Usage

fixed(pattern, ignore_case = FALSE)

coll(pattern, ignore_case = FALSE, locale = "en", ...)

regex(
  pattern,
  ignore_case = FALSE,
  multiline = FALSE,
  comments = FALSE,
  dotall = FALSE,
  ...
)

boundary(
  type = c("character", "line_break", "sentence", "word"),
  skip_word_none = NA,
  ...
)

Arguments

`pattern`	Pattern to modify behaviour.
`ignore_case`	Should case differences be ignored in the match? For `fixed()`, this uses a simple algorithm which assumes a one-to-one mapping between upper and lower case letters.
`locale`	Locale to use for comparisons. See `stringi::stri_locale_list()` for all possible options. Defaults to "en" (English) to ensure that default behaviour is consistent across platforms.
`...`	Other less frequently used arguments passed on to `stringi::stri_opts_collator()`, `stringi::stri_opts_regex()`, or `stringi::stri_opts_brkiter()`
`multiline`	If `TRUE`, `$` and `^` match the beginning and end of each line. If `FALSE`, the default, only match the start and end of the input.
`comments`	If `TRUE`, white space and comments beginning with `⁠#⁠` are ignored. Escape literal spaces with `⁠\\ ⁠`.
`dotall`	If `TRUE`, `.` will also match line terminators.
`type`	Boundary type to detect. `character` Every character is a boundary. `line_break` Boundaries are places where it is acceptable to have a line break in the current locale. `sentence` The beginnings and ends of sentences are boundaries, using intelligent rules to avoid counting abbreviations (details). `word` The beginnings and ends of words are boundaries.
`skip_word_none`	Ignore "words" that don't contain any characters or numbers - i.e. punctuation. Default `NA` will skip such "words" only when splitting on `word` boundaries.

Value

A stringr modifier object, i.e. a character vector with parent S3 class stringr_pattern.

Examples

pattern <- "a.b"
strings <- c("abb", "a.b")
str_detect(strings, pattern)
str_detect(strings, fixed(pattern))
str_detect(strings, coll(pattern))

# coll() is useful for locale-aware case-insensitive matching
i <- c("I", "\u0130", "i")
i
str_detect(i, fixed("i", TRUE))
str_detect(i, coll("i", TRUE))
str_detect(i, coll("i", TRUE, locale = "tr"))

# Word boundaries
words <- c("These are   some words.")
str_count(words, boundary("word"))
str_split(words, " ")[[1]]
str_split(words, boundary("word"))[[1]]

# Regular expression variations
str_extract_all("The Cat in the Hat", "[a-z]+")
str_extract_all("The Cat in the Hat", regex("[a-z]+", TRUE))

str_extract_all("a\nb\nc", "^.")
str_extract_all("a\nb\nc", regex("^.", multiline = TRUE))

str_extract_all("a\nb\nc", "a.")
str_extract_all("a\nb\nc", regex("a.", dotall = TRUE))

hadley/stringr documentation built on Aug. 21, 2024, 5:13 a.m.