aaa3_tinycodet_strings: Overview of the 'tinycodet' Extension of 'stringi'

aaa3_tinycodet_stringsR Documentation

Overview of the 'tinycodet' Extension of 'stringi'

Description

R's numerical functions are generally very fast. But R's native string functions are somewhat slow, do not have a unified naming scheme, and are not as comprehensive as R's impressive numerical functions.
The primary R-package that fixes this is 'stringi', which many, if not most, string related packages depend on (see the list of reverse-dependencies on CRAN).
As string manipulation is important to programming languages, even those primarily focused on mathematics, 'tinycodet' adds a little bit new functionality to 'stringi'.

'tinycodet' adds the following functions to extend 'stringi':

  • Find i^{th} pattern occurrence (stri_locate_ith), or i^{th} text boundary (stri_locate_ith_boundaries).

  • Concatenate a character matrix row- or column-wise.

'tinycodet' adds the following operators, to complement the already existing 'stringi' operators:

  • Infix operators for string arithmetic.

  • Infix operators for string sub-setting, which get or remove the first and/or last n characters from strings.

  • Infix operators for detecting patterns, and strfind()<- for locating/extracting/replacing found patterns.

And finally, 'tinycodet' adds the somewhat separate strcut_-functions, to cut strings into pieces without removing the delimiters.

Regarding Vector Recycling in the 'stringi'-based Functions

Generally speaking, vector recycling is supported as 'stringi' itself supports it also.
There are, however, a few exceptions.
First, matrix inputs (like in strcut_loc and string sub-setting operators) will generally not be recycled.
Second, the i argument in stri_locate_ith does not support vector recycling.
Scalar recycling is virtually always supported.

References

Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1–59, \Sexpr[results=rd]{tools:::Rd_expr_doi("doi:10.18637/jss.v103.i02")}

See Also

tinycodet_help, s_pattern

Examples


# character vector:
x <- c("3rd 1st 2nd", "5th 4th 6th")
print(x)

# detect if there are digits:
x %s{}% "\\d"

# find second last digit:
loc <- stri_locate_ith(x, i = -2, regex = "\\d")
stringi::stri_sub(x, from = loc)

# cut x into matrix of individual words:
mat <- strcut_brk(x, "word")

# sort rows of matrix using the fast %row~% operator:
rank <- stringi::stri_rank(as.vector(mat)) |> matrix(ncol = ncol(mat))
sorted <- mat %row~% rank
sorted[is.na(sorted)] <- ""

# join elements of every row into a single character vector:
stri_c_mat(sorted, margin = 1, sep = " ")


tinycodet documentation built on April 12, 2025, 1:39 a.m.