sortnut: Searching and other uses of sorting for 64bit integers

View source: R/sortuse64.R

sortnutR Documentation

Searching and other uses of sorting for 64bit integers

Description

This is roughly an implementation of hash functionality but based on sorting instead on a hasmap. Since sorting is more informative than hashingwe can do some more interesting things.

Usage

sortnut(sorted, ...)
ordernut(table, order, ...)
sortfin(sorted, x, ...)
orderfin(table, order, x, ...)
orderpos(table, order, x, ...)
sortorderpos(sorted, order, x, ...)
orderdup(table, order, ...)
sortorderdup(sorted, order, ...)
sortuni(sorted, nunique, ...)
orderuni(table, order, nunique, ...)
sortorderuni(table, sorted, order, nunique, ...)
orderupo(table, order, nunique, ...)
sortorderupo(sorted, order, nunique, keep.order = FALSE, ...)
ordertie(table, order, nties, ...)
sortordertie(sorted, order, nties, ...)
sorttab(sorted, nunique, ...)
ordertab(table, order, nunique, ...)
sortordertab(sorted, order, ...)
orderkey(table, order, na.skip.num = 0L, ...)
sortorderkey(sorted, order, na.skip.num = 0L, ...)
orderrnk(table, order, na.count, ...)
sortorderrnk(sorted, order, na.count, ...)
## S3 method for class 'integer64'
sortnut(sorted, ...)
## S3 method for class 'integer64'
ordernut(table, order, ...)
## S3 method for class 'integer64'
sortfin(sorted, x, method=NULL, ...)
## S3 method for class 'integer64'
orderfin(table, order, x, method=NULL, ...)
## S3 method for class 'integer64'
orderpos(table, order, x, nomatch=NA, method=NULL, ...)
## S3 method for class 'integer64'
sortorderpos(sorted, order, x, nomatch=NA, method=NULL, ...)
## S3 method for class 'integer64'
orderdup(table, order, method=NULL, ...)
## S3 method for class 'integer64'
sortorderdup(sorted, order, method=NULL, ...)
## S3 method for class 'integer64'
sortuni(sorted, nunique, ...)
## S3 method for class 'integer64'
orderuni(table, order, nunique, keep.order=FALSE, ...)
## S3 method for class 'integer64'
sortorderuni(table, sorted, order, nunique, ...)
## S3 method for class 'integer64'
orderupo(table, order, nunique, keep.order=FALSE, ...)
## S3 method for class 'integer64'
sortorderupo(sorted, order, nunique, keep.order = FALSE, ...)
## S3 method for class 'integer64'
ordertie(table, order, nties, ...)
## S3 method for class 'integer64'
sortordertie(sorted, order, nties, ...)
## S3 method for class 'integer64'
sorttab(sorted, nunique, ...)
## S3 method for class 'integer64'
ordertab(table, order, nunique, denormalize=FALSE, keep.order=FALSE, ...)
## S3 method for class 'integer64'
sortordertab(sorted, order, denormalize=FALSE, ...)
## S3 method for class 'integer64'
orderkey(table, order, na.skip.num = 0L, ...)
## S3 method for class 'integer64'
sortorderkey(sorted, order, na.skip.num = 0L, ...)
## S3 method for class 'integer64'
orderrnk(table, order, na.count, ...)
## S3 method for class 'integer64'
sortorderrnk(sorted, order, na.count, ...)
## S3 method for class 'integer64'
sortqtl(sorted, na.count, probs, ...)
## S3 method for class 'integer64'
orderqtl(table, order, na.count, probs, ...)

Arguments

x

an integer64 vector

sorted

a sorted integer64 vector

table

the original data with original order under the sorted vector

order

an integer order vector that turns 'table' into 'sorted'

nunique

number of unique elements, usually we get this from cache or call sortnut or ordernut

nties

number of tied values, usually we get this from cache or call sortnut or ordernut

denormalize

FALSE returns counts of unique values, TRUE returns each value with its counts

nomatch

the value to be returned if an element is not found in the hashmap

keep.order

determines order of results and speed: FALSE (the default) is faster and returns in sorted order, TRUE returns in the order of first appearance in the original data, but this requires extra work

probs

vector of probabilities in [0..1] for which we seek quantiles

na.skip.num

0 or the number of NAs. With 0, NAs are coded with 1L, with the number of NAs, these are coded with NA, the latter needed for as.factor.integer64

na.count

the number of NAs, needed for this low-level function algorithm

method

see details

...

further arguments, passed from generics, ignored in methods

Details

sortfun orderfun sortorderfun see also description
sortnut ordernut return number of tied and of unique values
sortfin orderfin %in%.integer64 return logical whether x is in table
orderpos sortorderpos match return positions of x in table
orderdup sortorderdup duplicated return logical whether values are duplicated
sortuni orderuni sortorderuni unique return unique values (=dimensiontable)
orderupo sortorderupo unique return positions of unique values
ordertie sortordertie return positions of tied values
orderkey sortorderkey positions of values in vector of unique values (match in dimensiontable)
sorttab ordertab sortordertab table tabulate frequency of values
orderrnk sortorderrnk rank averaging ties
sortqtl orderqtl return quantiles given probabilities

The functions sortfin, orderfin, orderpos and sortorderpos each offer three algorithms for finding x in table.
With method=1L each value of x is searched independently using binary search, this is fastest for small tables.
With method=2L the values of x are first sorted and then searched using doubly exponential search, this is the best allround method.
With method=3L the values of x are first sorted and then searched using simple merging, this is the fastest method if table is huge and x has similar size and distribution of values.
With method=NULL the functions use a heuristic to determine the fastest algorithm.

The functions orderdup and sortorderdup each offer two algorithms for setting the truth values in the return vector.
With method=1L the return values are set directly which causes random write access on a possibly large return vector.
With method=2L the return values are first set in a smaller bit-vector – random access limited to a smaller memory region – and finally written sequentially to the logical output vector.
With method=NULL the functions use a heuristic to determine the fastest algorithm.

Value

see details

Author(s)

Jens Oehlschlägel <Jens.Oehlschlaegel@truecluster.com>

See Also

match

Examples

 message("check the code of 'optimizer64' for examples:")
 print(optimizer64)

bit64 documentation built on Sept. 30, 2024, 9:23 a.m.