These functions provide means to compare strings in any locale using the Unicode collation algorithm.
strcoll(
e1,
e2,
locale = NULL,
strength = 3L,
alternate_shifted = FALSE,
french = FALSE,
uppercase_first = NA,
case_level = FALSE,
normalisation = FALSE,
numeric = FALSE
)
e1 %x<% e2
e1 %x<=% e2
e1 %x==% e2
e1 %x!=% e2
e1 %x>% e2
e1 %x>=% e2
| | |
|----|----|
| e1
, e2
| character vector whose corresponding elements are to be compared |
| locale
| NULL
or ""
for the default locale (see stri_locale_get
) or a single string with a locale identifier, see stri_locale_list
|
| strength
| see stri_opts_collator
|
| alternate_shifted
| see stri_opts_collator
|
| french
| see stri_opts_collator
|
| uppercase_first
| see stri_opts_collator
|
| case_level
| see stri_opts_collator
|
| normalisation
| see stri_opts_collator
|
| numeric
| see stri_opts_collator
|
These functions are fully vectorised with respect to both arguments.
For a locale-insensitive behaviour like that of strcmp
from the standard C library, call strcoll(e1, e2, locale="C", strength=4L, normalisation=FALSE)
. However, some normalisation will still be performed.
strcoll
returns an integer vector representing the comparison results: if a string in e1
is smaller than the corresponding string in e2
, the corresponding result will be equal to -1
, and 0
if they are canonically equivalent, as well as 1
if the former is greater than the latter.
The binary operators call strcoll
with default arguments and return logical vectors.
Replacements for base Comparison operators implemented with stri_cmp
.
collation in different locales is difficult and non-portable across platforms [fixed here -- using services provided by ICU]
overloading `<.character`
has no effect in R, because S3 method dispatch is done internally with hard-coded support for character arguments. We could have replaced the generic `<`
with the one that calls UseMethod
, but it feels like a too intrusive solution [fixed by introducing the `%x<%`
operator]
The official online manual of stringx at https://stringx.gagolewski.com/
Related function(s): xtfrm
# lexicographic vs. numeric sort
strcoll("100", c("1", "10", "11", "99", "100", "101", "1000"))
## [1] 1 1 -1 -1 0 -1 -1
strcoll("100", c("1", "10", "11", "99", "100", "101", "1000"), numeric=TRUE)
## [1] 1 1 1 1 0 -1 -1
strcoll("hladn\u00FD", "chladn\u00FD", locale="sk_SK")
## [1] -1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.