# sortnut: Searching and other uses of sorting for 64bit integers In bit64: A S3 Class for Vectors of 64bit Integers

## Description

This is roughly an implementation of hash functionality but based on sorting instead on a hasmap. Since sorting is more informative than hashingwe can do some more interesting things.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70``` ```sortnut(sorted, ...) ordernut(table, order, ...) sortfin(sorted, x, ...) orderfin(table, order, x, ...) orderpos(table, order, x, ...) sortorderpos(sorted, order, x, ...) orderdup(table, order, ...) sortorderdup(sorted, order, ...) sortuni(sorted, nunique, ...) orderuni(table, order, nunique, ...) sortorderuni(table, sorted, order, nunique, ...) orderupo(table, order, nunique, ...) sortorderupo(sorted, order, nunique, keep.order = FALSE, ...) ordertie(table, order, nties, ...) sortordertie(sorted, order, nties, ...) sorttab(sorted, nunique, ...) ordertab(table, order, nunique, ...) sortordertab(sorted, order, ...) orderkey(table, order, na.skip.num = 0L, ...) sortorderkey(sorted, order, na.skip.num = 0L, ...) orderrnk(table, order, na.count, ...) sortorderrnk(sorted, order, na.count, ...) ## S3 method for class 'integer64' sortnut(sorted, ...) ## S3 method for class 'integer64' ordernut(table, order, ...) ## S3 method for class 'integer64' sortfin(sorted, x, method=NULL, ...) ## S3 method for class 'integer64' orderfin(table, order, x, method=NULL, ...) ## S3 method for class 'integer64' orderpos(table, order, x, nomatch=NA, method=NULL, ...) ## S3 method for class 'integer64' sortorderpos(sorted, order, x, nomatch=NA, method=NULL, ...) ## S3 method for class 'integer64' orderdup(table, order, method=NULL, ...) ## S3 method for class 'integer64' sortorderdup(sorted, order, method=NULL, ...) ## S3 method for class 'integer64' sortuni(sorted, nunique, ...) ## S3 method for class 'integer64' orderuni(table, order, nunique, keep.order=FALSE, ...) ## S3 method for class 'integer64' sortorderuni(table, sorted, order, nunique, ...) ## S3 method for class 'integer64' orderupo(table, order, nunique, keep.order=FALSE, ...) ## S3 method for class 'integer64' sortorderupo(sorted, order, nunique, keep.order = FALSE, ...) ## S3 method for class 'integer64' ordertie(table, order, nties, ...) ## S3 method for class 'integer64' sortordertie(sorted, order, nties, ...) ## S3 method for class 'integer64' sorttab(sorted, nunique, ...) ## S3 method for class 'integer64' ordertab(table, order, nunique, denormalize=FALSE, keep.order=FALSE, ...) ## S3 method for class 'integer64' sortordertab(sorted, order, denormalize=FALSE, ...) ## S3 method for class 'integer64' orderkey(table, order, na.skip.num = 0L, ...) ## S3 method for class 'integer64' sortorderkey(sorted, order, na.skip.num = 0L, ...) ## S3 method for class 'integer64' orderrnk(table, order, na.count, ...) ## S3 method for class 'integer64' sortorderrnk(sorted, order, na.count, ...) ## S3 method for class 'integer64' sortqtl(sorted, na.count, probs, ...) ## S3 method for class 'integer64' orderqtl(table, order, na.count, probs, ...) ```

## Arguments

 `x` an `integer64` vector `sorted` a sorted `integer64` vector `table` the original data with original order under the sorted vector `order` an `integer` order vector that turns 'table' into 'sorted' `nunique` number of unique elements, usually we get this from cache or call `sortnut` or `ordernut` `nties` number of tied values, usually we get this from cache or call `sortnut` or `ordernut` `denormalize` FALSE returns counts of unique values, TRUE returns each value with its counts `nomatch` the value to be returned if an element is not found in the hashmap `keep.order` determines order of results and speed: `FALSE` (the default) is faster and returns in sorted order, `TRUE` returns in the order of first appearance in the original data, but this requires extra work `probs` vector of probabilities in [0..1] for which we seek quantiles `na.skip.num` 0 or the number of `NA`s. With 0, `NA`s are coded with 1L, with the number of `NA`s, these are coded with `NA`, the latter needed for `as.factor.integer64` `na.count` the number of `NA`s, needed for this low-level function algorithm `method` see details `...` further arguments, passed from generics, ignored in methods

## Details

 sortfun orderfun sortorderfun see also description `sortnut` `ordernut` return number of tied and of unique values `sortfin` `orderfin` `%in%.integer64` return logical whether `x` is in `table` `orderpos` `sortorderpos` `match` return positions of `x` in `table` `orderdup` `sortorderdup` `duplicated` return logical whether values are duplicated `sortuni` `orderuni` `sortorderuni` `unique` return unique values (=dimensiontable) `orderupo` `sortorderupo` `unique` return positions of unique values `ordertie` `sortordertie` return positions of tied values `orderkey` `sortorderkey` positions of values in vector of unique values (match in dimensiontable) `sorttab` `ordertab` `sortordertab` `table` tabulate frequency of values `orderrnk` `sortorderrnk` rank averaging ties `sortqtl` `orderqtl` return quantiles given probabilities

The functions `sortfin`, `orderfin`, `orderpos` and `sortorderpos` each offer three algorithms for finding `x` in `table`.
With `method=1L` each value of `x` is searched independently using binary search, this is fastest for small `table`s.
With `method=2L` the values of `x` are first sorted and then searched using doubly exponential search, this is the best allround method.
With `method=3L` the values of `x` are first sorted and then searched using simple merging, this is the fastest method if `table` is huge and `x` has similar size and distribution of values.
With `method=NULL` the functions use a heuristic to determine the fastest algorithm.

The functions `orderdup` and `sortorderdup` each offer two algorithms for setting the truth values in the return vector.
With `method=1L` the return values are set directly which causes random write access on a possibly large return vector.
With `method=2L` the return values are first set in a smaller bit-vector – random access limited to a smaller memory region – and finally written sequentially to the logical output vector.
With `method=NULL` the functions use a heuristic to determine the fastest algorithm.

see details

## Author(s)

Jens Oehlschlägel <Jens.Oehlschlaegel@truecluster.com>

`match`
 ```1 2``` ``` message("check the code of 'optimizer64' for examples:") print(optimizer64) ```