setorder: Fast reordering of a data.table by reference

Description Usage Arguments Details Value See Also Examples

Description

Note that in data.table parlance, all set* functions change their input by reference. That is, no copy is made at all, other than temporary working memory, which is as large as one column.. The only other data.table operator that modifies input by reference is :=. Check out the See Also section below for other set* function data.table provides.

setcolorder rearranges the columns of data.table, by reference, to the new order provided.

setorder() sorts or rearranges the rows of a data.table by reference, based on the columns provided. It can sort in both ascending and descending order. The functionality is identical to using ?order on a data.frame, except that setorder is much faster, is very memory efficient and is much more user-friendly.

Usage

1
2
3
setcolorder(x, neworder)
setorder(x, ...)
setorderv(x, cols, order=1L)

Arguments

x

A data.table.

neworder

Character vector of the new column name ordering. May also be column numbers.

...

The columns to sort by. Do not quote column names. If ... is missing (ex: setorder(x)), x is rearranged based on all columns in ascending order by default. To sort by a column in descending order prefix a "-", i.e., setorder(x, a, -b, c). The -b works when b is of type character as well.

cols

A character vector of column names of x, to which to order by. Do not add "-" here.

order

An integer vector with only possible values of 1 and -1, corresponding to ascending and descending order. The length of order must be either 1 or equal to that of cols. If length(order) == 1, it's recycled to length(cols).

Details

When it's required to reorder the columns of a data.table, the idiomatic way is to use setcolorder(x, neworder), instead of doing x <- x[, neworder, with=FALSE]. This is because the latter makes an entire copy of the data.table, which maybe unnecessary in most situations. setcolorder also allows column numbers instead of column names for neworder argument, although it isn't good programming practice to use column numbers. We recommend using column names.

data.table internally implements extremely fast radix based ordering, which will soon be exported and available as forder. However, in versions <= 1.9.2, fast ordering was only capable of increasing order (ascending). In versions >1.9.2, data.table's internal fast order is also capable of sorting in decreasing order. Note that setkey still requires and will only sort in ascending order, and is not related to setorder.

By implementing forder to handle decreasing order as well, we now don't have to rely on base:::order anymore. It is now possible to reorder the rows of a data.table based on columns by reference, ex: setorder(x, a, -b, c). Note that, -b also works with columns of type character, unlike base:::order which requires -xtfrm(y) (and is slow) instead.

Note that if setorder results in reordering of the rows of a keyed data.table, then it's key will be set to NULL.

Value

The input is modified by reference, and returned (invisibly) so it can be used in compound statements; e.g., setorder(DT,a,-b)[, cumsum(c), by=list(a,b)]. If you require a copy, take a copy first (using DT2=copy(DT)). See ?copy.

See Also

setkey, setattr, setnames, set, :=, setDT, copy

web statistics

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
set.seed(45L)
DT = data.table(A=sample(3, 10, TRUE), 
         B=sample(letters[1:3], 10, TRUE), C=sample(10))

# setorder
setorder(DT, A, -B)
# same as above but using 'setorderv'
# setorderv(DT, c("A", "B"), c(1,-1))

# setcolorder
setcolorder(DT, c("C", "A", "B"))

data.table documentation built on May 2, 2019, 4:57 p.m.