stringx: Replacements for Base String Functions Powered by 'stringi'

gsub: Replace Pattern Occurrences

sub2 replaces the first pattern occurrence in each string with a given replacement string. gsub2 replaces all (i.e., \'globally\') pattern matches.

sub2(x, pattern, replacement, ..., ignore_case = FALSE, fixed = FALSE)

gsub2(x, pattern, replacement, ..., ignore_case = FALSE, fixed = FALSE)

sub(
  pattern,
  replacement,
  x,
  ...,
  ignore.case = FALSE,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE
)

gsub(
  pattern,
  replacement,
  x,
  ...,
  ignore.case = FALSE,
  fixed = FALSE,
  perl = FALSE,
  useBytes = FALSE
)

| | | |----|----| | x | character vector with strings whose chunks are to be modified | | pattern | character vector of nonempty search patterns | | replacement | character vector with the corresponding replacement strings; in sub2 and gsub2, back-references (whenever fixed=FALSE) are indicated by $0..$99 and $<name>, whereas the base-R compatible sub and gsub, only allow \1..\9 | | ... | further arguments to stri_replace_first or stri_replace_all, e.g., locale, dotall | | ignore_case, ignore.case | single logical value; indicates whether matching should be case-insensitive | | fixed | single logical value; FALSE for matching with regular expressions (see about_search_regex); TRUE for fixed pattern matching (about_search_fixed); NA for the Unicode collation algorithm (about_search_coll) | | perl, useBytes | not used (with a warning if attempting to do so) [DEPRECATED] |

Not to be confused with substr.

These functions are fully vectorised with respect to x, pattern, and replacement.

gsub2 uses vectorise_all=TRUE because of the attribute preservation rules, stri_replace_all should be called directly if different behaviour is needed.

The [DEPRECATED] sub and [DEPRECATED] gsub simply call sub2 and gsub2 which have a cleaned-up argument list. Additionally, if fixed=FALSE, the back-references in replacement strings are converted to these accepted by the ICU regex engine.

Both functions return a character vector. They preserve the attributes of the longest inputs (unless they are dropped due to coercion).

Replacements for base sub and gsub implemented with stri_replace_first and stri_replace_all, respectively.

there are inconsistencies between the argument order and naming in grepl, strsplit, and startsWith (amongst others); e.g., where the needle can precede the haystack, the use of the forward pipe operator, |>, is less convenient [fixed here]
base R implementation is not portable as it is based on the system PCRE or TRE library (e.g., some Unicode classes may not be available or matching thereof can depend on the current LC_CTYPE category [fixed here]
not suitable for natural language processing [fixed here -- use fixed=NA]
two different regular expression libraries are used (and historically, ERE was used in place of TRE) [here, ICU Java-like regular expression engine is only available, hence the perl argument has no meaning]
not vectorised w.r.t. pattern and replacement [fixed here]
only 9 (unnamed) back-references can be referred to in the replacement strings [fixed in sub2 and gsub2]
perl=TRUE supports \U, \L, and \E in the replacement strings [not available here]

Marek Gagolewski

The official online manual of stringx at https://stringx.gagolewski.com/

Related function(s): paste, nchar, grepl2, gregexpr2, gregextr2 strsplit, gsubstr

trimws for removing whitespaces (amongst others) from the start or end of strings

"change \U0001f602 me \U0001f603" |> gsub2("\\p{L}+", "O_O")

## [1] "O_O 😂 O_O 😃"

x <- c("mario", "Mario", "M\u00E1rio", "M\u00C1RIO", "Mar\u00EDa", "Rosario", NA)
sub2(x, "mario", "M\u00E1rio", fixed=NA, strength=1L)

## [1] "Mário"   "Mário"   "Mário"   "Mário"   "María"   "Rosario" NA

sub2(x, "mario", "Mario", fixed=NA, strength=2L)

## [1] "Mario"   "Mario"   "Mário"   "MÁRIO"   "María"   "Rosario" NA

x <- "abcdefghijklmnopqrstuvwxyz"
p <- "(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)"
base::sub(p, "\\1\\9", x)

## [1] "ainopqrstuvwxyz"

base::gsub(p, "\\1\\9", x)

## [1] "ainv"

base::gsub(p, "\\1\\9", x, perl=TRUE)

## [1] "ainv"

base::gsub(p, "\\1\\13", x)

## [1] "aa3nn3"

sub2(x, p, "$1$13")

## [1] "amnopqrstuvwxyz"

gsub2(x, p, "$1$13")

## [1] "amnz"

gagolews/stringx documentation built on Jan. 15, 2025, 9:46 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

gagolews/stringx
Replacements for Base String Functions Powered by 'stringi'

.devel/sphinx/rapi/gsub.md
In gagolews/stringx: Replacements for Base String Functions Powered by 'stringi'

gsub: Replace Pattern Occurrences

Description

Usage

Arguments

Details

Value

Differences from Base R

Author(s)

See Also

Examples

R Package Documentation

Browse R Packages

We want your feedback!

gagolews/stringx Replacements for Base String Functions Powered by 'stringi'

.devel/sphinx/rapi/gsub.md In gagolews/stringx: Replacements for Base String Functions Powered by 'stringi'

gsub: Replace Pattern Occurrences

Description

Usage

Arguments

Details

Value

Differences from Base R

Author(s)

See Also

Examples

R Package Documentation

Browse R Packages

We want your feedback!

gagolews/stringx
Replacements for Base String Functions Powered by 'stringi'

.devel/sphinx/rapi/gsub.md
In gagolews/stringx: Replacements for Base String Functions Powered by 'stringi'