The sort
method for objects of class character
(sort.character
) uses the locale-sensitive Unicode collation algorithm to arrange strings in a vector with regards to a chosen lexicographic order.
xtfrm2
and [DEPRECATED] xtfrm
generate an integer vector that sort in the same way as its input, and hence can be used in conjunction with order
or rank
.
xtfrm2(x, ...)
## Default S3 method:
xtfrm2(x, ...)
## S3 method for class 'character'
xtfrm2(
x,
...,
locale = NULL,
strength = 3L,
alternate_shifted = FALSE,
french = FALSE,
uppercase_first = NA,
case_level = FALSE,
normalisation = FALSE,
numeric = FALSE
)
xtfrm(x)
## Default S3 method:
xtfrm(x)
## S3 method for class 'character'
xtfrm(x)
## S3 method for class 'character'
sort(
x,
...,
decreasing = FALSE,
na.last = NA,
locale = NULL,
strength = 3L,
alternate_shifted = FALSE,
french = FALSE,
uppercase_first = NA,
case_level = FALSE,
normalisation = FALSE,
numeric = FALSE
)
| | |
|----|----|
| x
| character vector whose elements are to be sorted |
| ...
| further arguments passed to other methods |
| locale
| NULL
or ""
for the default locale (see stri_locale_get
) or a single string with a locale identifier, see stri_locale_list
|
| strength
| see stri_opts_collator
|
| alternate_shifted
| see stri_opts_collator
|
| french
| see stri_opts_collator
|
| uppercase_first
| see stri_opts_collator
|
| case_level
| see stri_opts_collator
|
| normalisation
| see stri_opts_collator
|
| numeric
| see stri_opts_collator
|
| decreasing
| single logical value; if FALSE
, the ordering is nondecreasing (weakly increasing) |
| na.last
| single logical value; if TRUE
, then missing values are placed at the end; if FALSE
, they are put at the beginning; if NA
, then they are removed from the output whatsoever. |
What \'xtfrm\' stands for the current author does not know, but would appreciate someone\'s enlightening him.
sort.character
returns a character vector, with only the names
attribute preserved. Note that the output vector may be shorter than the input one.
xtfrm2.character
and xtfrm.character
return an integer vector; most attributes are preserved.
Replacements for the default S3 methods sort
and xtfrm
for character vectors implemented with stri_sort
and stri_rank
.
Collation in different locales is difficult and non-portable across platforms [fixed here -- using services provided by ICU]
Overloading xtfrm.character
has no effect in R, because S3 method dispatch is done internally with hard-coded support for character arguments. Thus, we needed to replace the generic xtfrm
with the one that calls UseMethod
[fixed here]
xtfrm
does not support customisation of the linear ordering relation it is based upon [fixed by introducing ...
argument to the new generic, xtfrm2
]
Neither order
, rank
, nor sort.list
is a generic, therefore they should have to be rewritten from scratch to allow the inclusion of our patches; interestingly, order
even calls xtfrm
, but only for classed objects [not fixed here -- see Examples for a workaround]
xtfrm
for objects of type character
does not preserve the names attribute (but does so for numeric
) [fixed here]
sort
seems to preserve only the names attribute which makes sense if na.last
is NA
, because the resulting vector might be shorter [not fixed here as it would break compatibility with other sorting methods]
Note that sort
by default removes missing values whatsoever, whereas order
has na.last=TRUE
[not fixed here as it would break compatibility with other sorting methods]
The official online manual of stringx at https://stringx.gagolewski.com/
Related function(s): strcoll
x <- c("a1", "a100", "a101", "a1000", "a10", "a10", "a11", "a99", "a10", "a1")
base::sort.default(x) # lexicographic sort
## [1] "a1" "a1" "a10" "a10" "a10" "a100" "a1000" "a101" "a11"
## [10] "a99"
sort(x, numeric=TRUE) # calls stringx:::sort.character
## [1] "a1" "a1" "a10" "a10" "a10" "a11" "a99" "a100" "a101"
## [10] "a1000"
xtfrm2(x, numeric=TRUE) # calls stringx:::xtfrm2.character
## [1] 1 8 9 10 3 3 6 7 3 1
rank(xtfrm2(x, numeric=TRUE), ties.method="average") # ranks with averaged ties
## [1] 1.5 8.0 9.0 10.0 4.0 4.0 6.0 7.0 4.0 1.5
order(xtfrm2(x, numeric=TRUE)) # ordering permutation
## [1] 1 10 5 6 9 7 8 2 3 4
x[order(xtfrm2(x, numeric=TRUE))] # equivalent to sort()
## [1] "a1" "a1" "a10" "a10" "a10" "a11" "a99" "a100" "a101"
## [10] "a1000"
# order a data frame w.r.t. decreasing ids and increasing vals
d <- data.frame(vals=round(runif(length(x)), 1), ids=x)
d[order(-xtfrm2(d[["ids"]], numeric=TRUE), d[["vals"]]), ]
## vals ids
## 4 0.9 a1000
## 3 0.4 a101
## 2 0.8 a100
## 8 0.9 a99
## 7 0.5 a11
## 6 0.0 a10
## 9 0.6 a10
## 5 0.9 a10
## 1 0.3 a1
## 10 0.5 a1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.