mixedsort | R Documentation |
These functions sort or order character strings containing embedded numbers so that the numbers are numerically sorted rather than sorted by character value. I.e. "Aspirin 50mg" will come before "Aspirin 100mg". In addition, case of character strings is ignored so that "a", will come before "B" and "C".
mixedsort(
x,
decreasing = FALSE,
na.last = TRUE,
blank.last = FALSE,
numeric.type = c("decimal", "roman"),
roman.case = c("upper", "lower", "both"),
scientific = TRUE
)
mixedorder(
x,
decreasing = FALSE,
na.last = TRUE,
blank.last = FALSE,
numeric.type = c("decimal", "roman"),
roman.case = c("upper", "lower", "both"),
scientific = TRUE
)
x |
Vector to be sorted. |
decreasing |
logical. Should the sort be increasing or decreasing?
Note that |
na.last |
for controlling the treatment of |
blank.last |
for controlling the treatment of blank values. If
|
numeric.type |
either "decimal" (default) or "roman". Are numeric
values represented as decimal numbers ( |
roman.case |
one of "upper", "lower", or "both". Are roman numerals represented using only capital letters ('IX') or lower-case letters ('ix') or both? |
scientific |
logical. Should exponential notation be allowed for numeric values. |
I often have character vectors (e.g. factor labels), such as compound and dose, that contain both text and numeric data. This function is useful for sorting these character vectors into a logical order.
It does so by splitting each character vector into a sequence of character and numeric sections, and then sorting along these sections, with numbers being sorted by numeric value (e.g. "50" comes before "100"), followed by characters strings sorted by character value (e.g. "A" comes before "B") ignoring case (e.g. 'A' has the same sort order as 'a').
By default, sort order is ascending, empty strings are sorted to the front,
and NA
values to the end. Setting descending=TRUE
changes the
sort order to descending and reverses the meanings of na.last
and
blank.last
.
Parsing looks for decimal numbers unless numeric.type="roman"
, in
which parsing looks for roman numerals, with character case specified by
roman.case
.
mixedorder
returns a vector giving the sort order of the
input elements. mixedsort
returns the sorted vector.
Gregory R. Warnes greg@warnes.net
sort
, order
## compound & dose labels
Treatment <- c(
"Control", "Aspirin 10mg/day", "Aspirin 50mg/day",
"Aspirin 100mg/day", "Acetomycin 100mg/day",
"Acetomycin 1000mg/day"
)
## ordinary sort puts the dosages in the wrong order
sort(Treatment)
## but mixedsort does the 'right' thing
mixedsort(Treatment)
## Here is a more complex example
x <- rev(c(
"AA 0.50 ml", "AA 1.5 ml", "AA 500 ml", "AA 1500 ml",
"EXP 1", "AA 1e3 ml", "A A A", "1 2 3 A", "NA", NA, "1e2",
"", "-", "1A", "1 A", "100", "100A", "Inf"
))
mixedorder(x)
mixedsort(x) # Notice that plain numbers, including 'Inf' show up
# before strings, NAs at the end, and blanks at the
# beginning .
mixedsort(x, na.last = TRUE) # default
mixedsort(x, na.last = FALSE) # push NAs to the front
mixedsort(x, blank.last = FALSE) # default
mixedsort(x, blank.last = TRUE) # push blanks to the end
mixedsort(x, decreasing = FALSE) # default
mixedsort(x, decreasing = TRUE) # reverse sort order
## Roman numerals
chapters <- c(
"V. Non Sequiturs", "II. More Nonsense",
"I. Nonsense", "IV. Nonesensical Citations",
"III. Utter Nonsense"
)
mixedsort(chapters, numeric.type = "roman")
## Lower-case Roman numerals
vals <- c(
"xix", "xii", "mcv", "iii", "iv", "dcclxxii", "cdxcii",
"dcxcviii", "dcvi", "cci"
)
(ordered <- mixedsort(vals, numeric.type = "roman", roman.case = "lower"))
roman2int(ordered)
## Control scientific notation for number matching:
vals <- c("3E1", "2E3", "4e0")
mixedsort(vals) # With scientfic notation
mixedsort(vals, scientific = FALSE) # Without scientfic notation
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.