str_search: 'stringi' Pattern Search Operators

str_searchR Documentation

Description

The x %s{}% p operator checks for every string in character vector x if the pattern defined in p is present.
When supplying a list on the right hand side (see s_pattern), one can optionally include the list element at = "start" or at = "end":

  • Supplying at = "start" will check if the pattern appears at the start of a string (like stri_startswith).

  • Supplying at = "end" will check if the pattern appears at the end of a string (like stri_endswith).

The x %s!{}% p operator is the same as x %s{}% p, except it checks for absence of the pattern, rather than presence.

For string (in)equality operators, see %s==% from the 'stringi' package.

strfind()<- locates, extracts, or replaces found patterns.
It complements the other string-related operators, and uses the same s_pattern API.
It functions as follows:

  • strfind() finds all pattern matches, and returns the extractions of the findings in a list, just like stri_extract_all.

  • strfind(..., i = "all" ), finds all pattern matches like stri_locate_all.

  • strfind(..., i = i), where i is an integer vector, locates the i^{th} occurrence of a pattern, and reports the locations in a matrix, just like stri_locate_ith.

  • strfind() <- value finds pattern matches in variable x, replaces the pattern matches with the character vector specified in value, and assigns the transformed character vector back to x.
    This is somewhat similar to stri_replace, though the replacement is done in-place.

Usage

x %s{}% p

x %s!{}% p

strfind(x, p, ..., i, rt)

strfind(x, p, ..., i, rt) <- value

Arguments

x

a string or character vector.
For ⁠strfind()<-⁠, x must obviously be the variable containing the character vector/string, since ⁠strfind()<-⁠ performs assignment in-place.

p

either a list with 'stringi' arguments (see s_pattern), or else a character vector with regular expressions.
See also the Details section.
[REGEX]
[FIXED]
[COLL]
[CHARCLASS]

...

additional arguments to be specified.

i

either one of the following can be given for i:

  • if i is not given or NULL, strfind() extracts all found pattern occurrences.

  • if i is the string "all", strfind() locates all found pattern occurrences.

  • if i is an integer, strfind() locates the i^{th} pattern occurrences.
    See the i argument in stri_locate_ith for details.

For strfind() <- value, i must not be specified.

rt

use rt to specify the Replacement Type that ⁠strfind()<-⁠ should perform.
Either one of the following can be given for rt:

  • if rt is not given, NULL or "vec", ⁠strfind()<-⁠ performs regular, vectorized replacement of all occurrences.

  • if rt = "dict", ⁠strfind()<-⁠ performs dictionary replacement of all occurrences.

  • if rt = "first", ⁠strfind()<-⁠ replaces only the first occurrences.

  • if rt = "last", ⁠strfind()<-⁠ replaces only the last occurrences.

Note: rt = "first" and rt = "last" only exist for convenience; for more specific locational replacement, use stri_locate_ith or strfind(..., i) with numeric i (see the Examples section).
For strfind(), rt must not be specified.

value

a character vector giving the replacement values.

Details

Right-hand Side List for the %s{}% and %s!{}% Operators
When supplying a list to the right-hand side of the %s{}% and %s!{}% operators, one can add the argument at.
If at = "start", the operators will check if the pattern is present/absent at the start of the string.
If at = "end", the operators will check if the pattern is present/absent at the end of the string.
Unlike stri_startswith or stri_endswith, regex is supported by the %s{}% and %s!{}% operators.
See examples below.

Vectorized Replacement vs Dictionary Replacement

  • Vectorized replacement:
    x, p, and value are of the same length (or recycled to become the same length).
    All occurrences of pattern p[j] in x[j] is replaced with value[j], for every j.

  • Dictionary replacement:
    p and value are of the same length, and their length is independent of the length of x.
    For every single string in x, all occurrences of pattern p[1] are replaced with value[1],
    all occurrences of pattern p[2] are replaced with value[2], etc.

Notice that for single replacement, i.e. rt = "first" or rt = "last", it makes no sense to distinguish between vectorized or dictionary replacement, since then only a single occurrence is being replaced per string.
See examples below.

Value

For the x %s{}% p and x %s!{}% p operators:
Return logical vectors.

For strfind():
Returns a list with extractions of all found patterns.

For strfind(..., i = "all"):
Returns a list with all found pattern locations.

For strfind(..., i = i) with integer vector i:
Returns an integer matrix with two columns, giving the start and end positions of the i^{th} matches, two NAs if no matches are found, and also two NAs if str is NA.

For strfind() <- value:
Returns nothing, but performs in-place replacement (using R's default in-place semantics) of the found patterns in variable x.

Note

⁠strfind()<-⁠ performs in-place replacement.
Therefore, the character vector or string to perform replacement on, must already exist as a variable.
So take for example the following code:

strfind("hello", p = "e") <- "a" # this obviously does not work

y <- "hello"
strfind(y, p = "e") <- "a" # this works fine

In the above code, the first ⁠strfind()<-⁠ call does not work, because the string needs to exist as a variable.

See Also

tinycodet_strings

Examples


# example of %s{}% and %s!{}% ====

x <- c(paste0(letters[1:13], collapse = ""),
       paste0(letters[14:26], collapse = ""))
print(x)
x %s{}% "a"
x %s!{}% "a"
which(x %s{}% "a")
which(x %s!{}% "a")
x[x %s{}% "a"]
x[x %s!{}% "a"]
x[x %s{}% "a"] <- 1
x[x %s!{}% "a"] <- 1
print(x)

x <- c(paste0(letters[1:13], collapse = ""),
       paste0(letters[14:26], collapse = ""))
x %s{}% "1"
x %s!{}% "1"
which(x %s{}% "1")
which(x %s!{}% "1")
x[x %s{}% "1"]
x[x %s!{}% "1"]
x[x %s{}% "1"] <- "a"
x[x %s!{}% "1"] <- "a"
print(x)

#############################################################################


# Example of %s{}% and %s!{}% with "at" argument ====

x <- c(paste0(letters, collapse = ""),
       paste0(rev(letters), collapse = ""), NA)
p <- s_fixed("abc", at = "start")
x %s{}% p
stringi::stri_startswith(x, fixed = "abc") # same as above

p <- s_fixed("xyz", at = "end")
x %s{}% p
stringi::stri_endswith(x, fixed = "xyz") # same as above

p <- s_fixed("cba", at = "end")
x %s{}% p
stringi::stri_endswith(x, fixed = "cba") # same as above

p <- s_fixed("zyx", at = "start")
x %s{}% p
stringi::stri_startswith(x, fixed = "zyx") # same as above



#############################################################################


# Example of transforming ith occurrence ====

# new character vector:
x <- c(paste0(letters[1:13], collapse = ""),
       paste0(letters[14:26], collapse = ""))
print(x)

# report ith (second and second-last) vowel locations:
p <- s_regex( # vowels
  rep("A|E|I|O|U", 2),
  case_insensitive = TRUE
)
loc <- strfind(x, p, i = c(2, -2))
print(loc)

# extract ith vowels:
extr <- stringi::stri_sub(x, from = loc)
print(extr)

# replace ith vowels with numbers:
repl <- chartr("aeiou", "12345", extr) # transformation
stringi::stri_sub(x, loc) <- repl
print(x)


#############################################################################


# Example of strfind for regular vectorized replacement ====

x <- rep('The quick brown fox jumped over the lazy dog.', 3)
print(x)
p <- c('quick', 'brown', 'fox')
rp <- c('SLOW',  'BLACK', 'BEAR')
x %s{}% p
strfind(x, p)
strfind(x, p) <- rp
print(x)

#############################################################################


# Example of strfind for dictionary replacement ====

x <- rep('The quick brown fox jumped over the lazy dog.', 3)
print(x)
p <- c('quick', 'brown', 'fox')
rp <- c('SLOW',  'BLACK', 'BEAR')
# thus dictionary is:
# quick => SLOW; brown => BLACK; fox => BEAR
strfind(x, p, rt = "dict") <- rp
print(x)


#############################################################################


# Example of strfind for first and last replacement ====

x <- rep('The quick brown fox jumped over the lazy dog.', 3)
print(x)
p <- s_fixed("the", case_insensitive = TRUE)
rp <- "One"
strfind(x, p, rt = "first") <- rp
print(x)

x <- rep('The quick brown fox jumped over the lazy dog.', 3)
print(x)
p <- s_fixed("the", case_insensitive = TRUE)
rp <- "Some Other"
strfind(x, p, rt = "last") <- rp
print(x)





tinycodet documentation built on Sept. 12, 2024, 7:03 a.m.