
gsub: Replace Pattern Occurrences


sub2 replaces the first pattern occurrence in each string with a given replacement string. gsub2 replaces all (i.e., \'globally\') pattern matches.


sub2(x, pattern, replacement, ..., ignore_case = FALSE, fixed = FALSE)

gsub2(x, pattern, replacement, ..., ignore_case = FALSE, fixed = FALSE)

  ignore.case = FALSE,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE

  ignore.case = FALSE,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE


| | | |----|----| | x | character vector with strings whose chunks are to be modified | | pattern | character vector of nonempty search patterns | | replacement | character vector with the corresponding replacement strings; in sub2 and gsub2, back-references (whenever fixed=FALSE) are indicated by $0..$99 and $<name>, whereas the base-R compatible sub and gsub, only allow \1..\9 | | ... | further arguments to stri_replace_first or stri_replace_all, e.g., locale, dotall | | ignore_case, ignore.case | single logical value; indicates whether matching should be case-insensitive | | fixed | single logical value; FALSE for matching with regular expressions (see about_search_regex); TRUE for fixed pattern matching (about_search_fixed); NA for the Unicode collation algorithm (about_search_coll) | | perl, useBytes | not used (with a warning if attempting to do so) [DEPRECATED] |


Not to be confused with substr.

These functions are fully vectorised with respect to x, pattern, and replacement.

gsub2 uses vectorise_all=TRUE because of the attribute preservation rules, stri_replace_all should be called directly if different behaviour is needed.

The [DEPRECATED] sub and [DEPRECATED] gsub simply call sub2 and gsub2 which have a cleaned-up argument list. Additionally, if fixed=FALSE, the back-references in replacement strings are converted to these accepted by the ICU regex engine.


Both functions return a character vector. They preserve the attributes of the longest inputs (unless they are dropped due to coercion).

Differences from Base R

Replacements for base sub and gsub implemented with stri_replace_first and stri_replace_all, respectively.


Marek Gagolewski

See Also

The official online manual of stringx at https://stringx.gagolewski.com/

Related function(s): paste, nchar, grepl2, gregexpr2, gregextr2 strsplit, gsubstr

trimws for removing whitespaces (amongst others) from the start or end of strings


"change \U0001f602 me \U0001f603" |> gsub2("\\p{L}+", "O_O")
## [1] "O_O 😂 O_O 😃"
x <- c("mario", "Mario", "M\u00E1rio", "M\u00C1RIO", "Mar\u00EDa", "Rosario", NA)
sub2(x, "mario", "M\u00E1rio", fixed=NA, strength=1L)
## [1] "Mário"   "Mário"   "Mário"   "Mário"   "María"   "Rosario" NA
sub2(x, "mario", "Mario", fixed=NA, strength=2L)
## [1] "Mario"   "Mario"   "Mário"   "MÁRIO"   "María"   "Rosario" NA
x <- "abcdefghijklmnopqrstuvwxyz"
p <- "(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)"
base::sub(p, "\\1\\9", x)
## [1] "ainopqrstuvwxyz"
base::gsub(p, "\\1\\9", x)
## [1] "ainv"
base::gsub(p, "\\1\\9", x, perl=TRUE)
## [1] "ainv"
base::gsub(p, "\\1\\13", x)
## [1] "aa3nn3"
sub2(x, p, "$1$13")
## [1] "amnopqrstuvwxyz"
gsub2(x, p, "$1$13")
## [1] "amnz"

gagolews/stringx documentation built on Jan. 15, 2025, 9:46 p.m.