View source: R/stri_locate_ith.R
stri_locate_ith | R Documentation |
i^{th}
Pattern Occurrence or Text BoundaryThe stri_locate_ith()
function
locates the i^{th}
occurrence of a pattern in each string of
some character vector.
The stri_locate_ith_boundaries()
function
locates the i^{th}
text boundary
(like character, word, line, or sentence boundaries).
stri_locate_ith(str, i, ..., regex, fixed, coll, charclass)
stri_locate_ith_regex(str, pattern, i, ..., opts_regex = NULL)
stri_locate_ith_fixed(str, pattern, i, ..., opts_fixed = NULL)
stri_locate_ith_coll(str, pattern, i, ..., opts_collator = NULL)
stri_locate_ith_charclass(str, pattern, i, merge = TRUE, ...)
stri_locate_ith_boundaries(str, i, ..., opts_brkiter = NULL)
The 'stringi' functions only support operations on the
first, last, or all occurrences of a pattern.
The stri_locate_ith()
function
allows locating the i^{th}
occurrence of a pattern.
This allows for several workflows
for operating on the i^{th}
pattern occurrence.
See also the examples section.
Extract i^{th}
Occurrence of a Pattern
For extracting the i^{th}
pattern occurrence:
Locate the the i^{th}
occurrence using stri_locate_ith()
,
and then extract it using, for example, stri_sub.
Replace/Transform i^{th}
Occurrence of a Pattern
For replacing/transforming the i^{th}
pattern occurrence:
Locate the the i^{th}
occurrence using stri_locate_ith()
.
Extract the occurrence using stri_sub.
Transform or replace the extracted sub-strings.
Return the transformed/replaced sub-string back,
using again stri_sub.
Capture Groups of i^{th}
Occurrence of a Pattern
The capture_groups
argument for regex
is not supported within stri_locate_ith()
.
To capture the groups of the i^{th}
occurrences:
Use stri_locate_ith()
to locate the i^{th}
occurrences without group capture.
Extract the occurrence using stri_sub.
Get the matched group capture on the extracted occurrences using stri_match.
The stri_locate_ith()
function returns an integer matrix with two columns,
giving the start and end positions of the i^{th}
matches,
two NA
s if no matches are found,
and also two NA
s if str
is NA
.
If an empty string or empty pattern is supplied,
a warning is given and a matrix with 0 rows is returned.
Long Vectors
The stri_locate_ith
-functions
do not support long vectors
(i.e. character vectors with more than 2^31 - 1
strings).
Performance
The performance of stri_locate_ith()
is close to that of stri_locate_all.
tinycodet_strings
#############################################################################
# practical example: transform regex pattern ====
# input character vector:
x <- c(paste0(letters[1:13], collapse = ""),
paste0(letters[14:26], collapse = ""))
print(x)
# locate ith (second and second-last) vowel locations:
p <- rep("A|E|I|O|U", 2) # vowels
loc <- stri_locate_ith(x, c(2, -2), regex = p, case_insensitive = TRUE)
print(loc)
# extract ith vowels:
extr <- stringi::stri_sub(x, loc)
print(extr)
# transform & replace ith vowels with numbers:
repl <- chartr("aeiou", "12345", extr)
stringi::stri_sub(x, loc) <- repl
# result (notice ith vowels are now numbers):
print(x)
#############################################################################
# practical example: group-capture regex pattern ====
# input character:
# first group: c(breakfast=eggs, breakfast=bacon)
# second group: c(lunch=pizza, lunch=spaghetti)
x <- c('breakfast=eggs;lunch=pizza',
'breakfast=bacon;lunch=spaghetti',
'no food here') # no group here
print(x)
# locate ith=2nd group:
p <- '(\\w+)=(\\w+)'
loc <- stri_locate_ith(x, i = 2, regex = p)
print(loc)
# extract ith=2nd group:
extr <- stringi::stri_sub(x, loc)
print(extr)
# capture ith=2nd group:
stringi::stri_match(extr, regex = p)
#############################################################################
# practical example: replace words using boundaries ====
# input character vector:
x <- c("good morning and good night",
"hello ladies and gentlemen")
print(x)
# report ith word locations:
loc <- stri_locate_ith_boundaries(x, c(-3, 3), type = "word")
print(loc)
# extract ith words:
extr <- stringi::stri_sub(x, from = loc)
print(extr)
# transform and replace words (notice ith words have inverted case):
tf <- chartr(extr, old = "a-zA-Z", new = "A-Za-z")
stringi::stri_sub(x, loc) <- tf
# result:
print(x)
#############################################################################
# find pattern ====
extr <- stringi::stri_sub(x, from = loc)
repl <- chartr(extr, old = "a-zA-Z", new = "A-Za-z")
stringi::stri_sub_replace(x, loc, replacement=repl)
#############################################################################
# simple pattern ====
x <- rep(paste0(1:10, collapse = ""), 10)
print(x)
out <- stri_locate_ith(x, 1:10, regex = as.character(1:10))
cbind(1:10, out)
x <- c(paste0(letters[1:13], collapse = ""),
paste0(letters[14:26], collapse = ""))
print(x)
p <- rep("a|e|i|o|u", 2)
out <- stri_locate_ith(x, c(-1, 1), regex = p)
print(out)
substr(x, out[, 1], out[, 2])
#############################################################################
# ignore case pattern ====
x <- c(paste0(letters[1:13], collapse = ""),
paste0(letters[14:26], collapse = ""))
print(x)
p <- rep("A|E|I|O|U", 2)
out <- stri_locate_ith(x, c(1, -1), regex = p, case_insensitive = TRUE)
substr(x, out[, 1], out[, 2])
#############################################################################
# multi-character pattern ====
x <- c(paste0(letters[1:13], collapse = ""),
paste0(letters[14:26], collapse = ""))
print(x)
# multi-character pattern:
p <- rep("AB", 2)
out <- stri_locate_ith(x, c(1, -1), regex = p, case_insensitive = TRUE)
print(out)
substr(x, out[, 1], out[, 2])
#############################################################################
# Replacement transformation using stringi ====
x <- c("hello world", "goodbye world")
loc <- stri_locate_ith(x, c(1, -1), regex = "a|e|i|o|u")
extr <- stringi::stri_sub(x, from = loc)
repl <- chartr(extr, old = "a-zA-Z", new = "A-Za-z")
stringi::stri_sub_replace(x, loc, replacement = repl)
#############################################################################
# Boundaries ====
test <- c(
paste0("The\u00a0above-mentioned features are very useful. ",
"Spam, spam, eggs, bacon, and spam. 123 456 789"),
"good morning, good evening, and good night"
)
loc <- stri_locate_ith_boundaries(test, i = c(1, -1), type = "word")
stringi::stri_sub(test, from = loc)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.