strcut_loc | R Documentation |
The strcut_loc()
function
cuts every string in a character vector around a location range loc
,
such that every string is cut into the following parts:
the sub-string before loc
;
the sub-string at loc
itself;
the sub-string after loc
.
The location range loc
would usually be matrix with 2 columns,
giving the start and end points of some pattern match.
The strcut_brk()
function
(a wrapper around stri_split_boundaries(..., tokens_only = FALSE)
)
cuts every string into individual text breaks
(like character, word, line, or sentence boundaries).
strcut_loc(str, loc)
strcut_brk(str, type = "character", tolist = FALSE, n = -1L, ...)
str |
a string or character vector. |
loc |
Either one of the following:
|
type |
either one of the following:
|
tolist |
logical, indicating if |
n |
see stri_split_boundaries. |
... |
additional arguments to be passed to stri_split_boundaries. |
The strcut_
functions provide a short and concise way to cut strings into pieces,
without removing the delimiters,
which is an operation that lies at the core of virtually all boundaries-operations in 'stringi'.
The main difference between the strcut_
- functions
and stri_split / strsplit,
is that the latter generally removes the delimiter patterns in a string when cutting,
while the strcut_
-functions do not attempt to remove parts of the string by default,
they only attempt to cut the strings into separate pieces.
Moreover, the strcut_
- functions return a matrix by default.
For strcut_loc()
:
A character matrix with length(str)
rows and 3 columns,
where for every row i
it holds the following:
the first column contains the sub-string before loc[i,]
,
or NA
if loc[i,]
contains NA
;
the second column contains the sub_string at loc[i,]
,
or the uncut string if loc[i,]
contains NA
;
the third and last column contains the sub-string after loc[i,]
,
or NA
if loc[i,]
contains NA
.
For strcut_brk(..., tolist = FALSE)
:
A character matrix with length(str)
rows and
a number of columns equal to the maximum number of pieces str
was cut in.
Empty places are filled with NA
.
For strcut_brk(..., tolist = TRUE)
:
A list with length(str)
elements,
where each element is a character vector containing the cut string.
tinycodet_strings
x <- rep(paste0(1:10, collapse = ""), 10)
print(x)
loc <- stri_locate_ith(x, 1:10, fixed = as.character(1:10))
strcut_loc(x, loc)
strcut_loc(x, c(5, 5))
strcut_loc(x, c(NA, NA))
strcut_loc(x, c(5, NA))
strcut_loc(x, c(NA, 5))
test <- "The\u00a0above-mentioned features are very useful. " %s+%
"Spam, spam, eggs, bacon, and spam. 123 456 789"
strcut_brk(test, "line")
strcut_brk(test, "word")
strcut_brk(test, "sentence")
strcut_brk(test)
strcut_brk(test, n = 1)
strcut_brk(test, "line", tolist = TRUE)
strcut_brk(test, "word", tolist = TRUE)
strcut_brk(test, "sentence", tolist = TRUE)
brk <- stringi::stri_opts_brkiter(
type = "line"
)
strcut_brk(test, brk)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.