Description Usage Arguments Details Value Warning Author(s) See Also Examples
Sorts a data frame using one or more variables.
1 |
x |
Data frame to be sorted |
... |
A list of sort terms (see below) |
alphabetical |
Should character vectors be sorted alphabetically? |
The simplest use of this function is to sort a data frame x
in terms of one or more of the variables it contains. If for instance, the data frame x
contains two variables a
and b
, then the command sortFrame(x,a,b)
sorts by variable a
, breaking ties using variable b
. Numeric variables are sorted in ascending order: to sort in descending order of a
and then ascending order of b
, use the command sortFrame(x,-a,b)
. Factors are treated as numeric variables, and are sorted by the internal codes (i.e., the first factor level equals 1, the second factor levels equals 2 and so on). Character vectors are sorted in alphabetical order, which differs from the ordering used by the sort
function; to use the default 'ascii' ordering, specify alphabetical=FALSE
. Minus signs can be used in conjunction with character vectors in order to sort in reverse alphabetical order. If c
represents a character variable, then sortFrame(x,c)
sorts in alphabetical order, whereas sortFrame(x,-c)
sorts in reverse alphabetical order.
It is also possible to specify more complicated sort terms by including expressions using multiple variables within a single term, but care is required. For instance, it is possible to sort the data frame by the sum of two variables, using the command sortFrame(x, a+b)
. For numeric variables expressions of this kind should work in the expected manner, but this is not always the case for non-numeric variables: sortFrame
uses the xtfrm
function to provide, for every variable referred to in the list of sort terms (...
) a numeric vector that sorts in the same order as the original variable. This reliance is what makes reverse alphabetical order (e.g., sortFrame(x,-c)
) work. However, it also means that it is possible to specify somewhat nonsensical sort terms for character vectors by abusing the numerical coding (e.g. sortFrame(x,(c-3)^2)
; see the examples section). It also means that sorting in terms of string operation functions (e.g., nchar
) do not work as expected. See examples section. Future versions of sortFrame
will (hopefully) address this, possibly by allowing the user to "switch off" the internal use of xtfrm
, or else by allowing AsIs
expressions to be used in sort terms.
The sorted data frame
This package is under development, and has been released only due to teaching constraints. Until this notice disappears from the help files, you should assume that everything in the package is subject to change. Backwards compatibility is NOT guaranteed. Functions may be deleted in future versions and new syntax may be inconsistent with earlier versions. For the moment at least, this package should be treated with extreme caution.
Daniel Navarro
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 | # first, create a data frame to be sorted...
txt <- c("bob","Clare","clare","bob","eve","eve")
num1 <- c(3,1,2,0,0,2)
num2 <- c(1,1,3,0,3,2)
etc <- c("not","used","as","a","sort","term")
dataset <- data.frame( txt, num1, num2, etc, stringsAsFactors=FALSE )
# txt num1 num2 etc
# 1 bob 3 1 not
# 2 Clare 1 1 used
# 3 clare 2 3 as
# 4 bob 0 0 a
# 5 eve 0 3 sort
# 6 eve 2 2 term
#### Sorting by numeric variables ####
# Simplest use of the function is to sort the data frame by a single
# numeric variable, with the results to be returned in increasing
# order, and ties to be kept in their original order:
sortFrame( dataset, num1 )
# txt num1 num2 etc
# 4 bob 0 0 a
# 5 eve 0 3 sort
# 2 Clare 1 1 used
# 3 clare 2 3 as
# 6 eve 2 2 term
# 1 bob 3 1 not
# Specifying a second sort term will break ties using the second
# term. For instance, we can sort by 'num1' (ascending order),
# breaking ties by 'num2' (ascending order):
sortFrame( dataset, num1, num2 )
# txt num1 num2 etc
# 4 bob 0 0 a
# 5 eve 0 3 sort
# 2 Clare 1 1 used
# 6 eve 2 2 term
# 3 clare 2 3 as
# 1 bob 3 1 not
# Specifying negative numbers sorts in descending order. For
# instance, to sort by 'num1' in descending order, but break
# ties by 'num2' in ascending order, use the following:
sortFrame( dataset, -num1, num2 )
# txt num1 num2 etc
# 1 bob 3 1 not
# 6 eve 2 2 term
# 3 clare 2 3 as
# 2 Clare 1 1 used
# 4 bob 0 0 a
# 5 eve 0 3 sort
#### Sorting by character variables ####
# When sorting by character variables (but not factors - see
# below), the default is to sort alphabetically, with upper
# case preceding lowr case. For example:
sortFrame( dataset, txt )
# txt num1 num2 etc
# 1 bob 3 1 not
# 4 bob 0 0 a
# 2 Clare 1 1 used
# 3 clare 2 3 as
# 5 eve 0 3 sort
# 6 eve 2 2 term
# Sorting into reverse alphabetical order is possible using
# minus signs. For example:
sortFrame( dataset, -txt )
# txt num1 num2 etc
# 5 eve 0 3 sort
# 6 eve 2 2 term
# 3 clare 2 3 as
# 2 Clare 1 1 used
# 1 bob 3 1 not
# 4 bob 0 0 a
#### Other examples #####
# If alphabetical order is not desired for character variables
# it is possible to force sortFrame to use the default 'ASCII'
# ordering, in which all upper case letters precede all lower
# case ones, by specifying alphabetical = FALSE, as follows:
sortFrame( dataset, txt, alphabetical = FALSE )
# txt num1 num2 etc
# 2 Clare 1 1 used
# 1 bob 3 1 not
# 4 bob 0 0 a
# 3 clare 2 3 as
# 5 eve 0 3 sort
# 6 eve 2 2 term
# Note that factors are treated differently to character vectors.
# They are not sorted alphabetically: instead they are sorted in
# factor level order. For example, if we construct a data frame in
# which 'txt' is a factor, the results follow the order of the levels
dataset2 <- dataset
dataset2$txt <- factor( dataset2$txt, levels = c('bob','eve','clare','Clare'))
sortFrame( dataset2, txt )
# txt num1 num2 etc
# 1 bob 3 1 not
# 4 bob 0 0 a
# 5 eve 0 3 sort
# 6 eve 2 2 term
# 3 clare 2 3 as
# 2 Clare 1 1 used
# Sorting by functions of multible variables is also possible.
# Note that this functionality is intended to be applied to numeric
# variables only. It does work for non-numeric variables because of
# the internal reliance on the xtfrm function, but the results may
# be unexpected.
sortFrame( dataset, num1+num2 )
# txt num1 num2 etc
# 4 bob 0 0 a
# 2 Clare 1 1 used
# 5 eve 0 3 sort
# 1 bob 3 1 not
# 6 eve 2 2 term
# 3 clare 2 3 as
# An example of a nonsensical application of mathematical operations
# when sorting by character vector. It works, since the character
# vector is treated as a numeric equivalent internally, but the
# output does not make a great deal of sense:
sortFrame( dataset, (txt-3)^2 )
# txt num1 num2 etc
# 2 Clare 1 1 used
# 3 clare 2 3 as
# 1 bob 3 1 not
# 4 bob 0 0 a
# 5 eve 0 3 sort
# 6 eve 2 2 term
# An example where sorting by text processing operations fails because
# the xtfrm function converts it to a numerical vector before the text
# processing operation is applied:
sortFrame( dataset, nchar(txt) )
# txt num1 num2 etc
# 1 bob 3 1 not
# 2 Clare 1 1 used
# 3 clare 2 3 as
# 4 bob 0 0 a
# 5 eve 0 3 sort
# 6 eve 2 2 term
# ... no sorting has occurred here. Future releases may allow "as is"
# terms to be included, which would allow something along the following
# lines: sortFrame( dataset, nchar(I(txt)) ), and would produce the
# desired output where the longer strings are sorted to the bottom of the
# data frame. However, no such functionality currently exists.
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.