This package emerged from a fascination with spatial partitioning algorithms. I was curious if there was a simple and fast way to move multidimensional data elements near to each other in memory so that they could be retrieved quickly. I also admire the structure of the C++ standard library and was curious if the search functions for sorted sequences could be extended to higher dimensions. At the same time, I was interested to learn some of the newer feature of C++17.
The result is this package, which collects several ideas, including:
kdtools.h
) that can be used separately from
this package. It has multidimensional analogs to the C++ standard
library functions for searching sorted lists (e.g., upper_bound,
binary_search) but that operate on sequences of tuples rather than
scalar values. It uses template metaprogramming to generate code for
each dimension at compile time at the cost of some binary bloat.
Sorting multidimensional arrays is fully threaded and very fast. It
produces an implicit kd-tree split on medians that supports fast
range queries and nearest-neighbor searches.tuplemapr.h
header, which allows one to apply an arbitrary
callable over the dimensions of a collection of tuple-like objects.
It is entirely constexpr
(except when calling certain standard
library functions) and employs fold-expressions where convenient.More details are here. Methods and benchmarks are here.
library(kdtools)
x = kd_sort(matrix(runif(400), 200))
plot(x, type = 'l', asp = 1, axes = FALSE, xlab = NA, ylab = NA)
points(x, pch = 19, col = rainbow(200, alpha = 0.25), cex = 2)
y = kd_range_query(x, c(1/4, 1/4), c(3/4, 3/4))
points(y, pch = 19, cex = 0.5, col = "red")
The following demonstrates a mixed-type range query on a data frame.
df <- kd_sort(data.frame(a = runif(12),
b = as.integer(rpois(12, 1)),
c = sample(month.name),
stringsAsFactors = FALSE))
print(df)
#> a b c
#> 8 0.01271206 0 July
#> 10 0.20984687 0 June
#> 3 0.12306503 1 September
#> 11 0.10688313 1 November
#> 6 0.30972967 2 February
#> 2 0.12909076 1 March
#> 5 0.38071652 0 May
#> 1 0.42951487 0 April
#> 9 0.87586080 1 December
#> 12 0.57758359 1 August
#> 7 0.54902204 2 January
#> 4 0.71089401 2 October
lower <- list(0.1, 1L, "August")
upper <- list(0.9, 4L, "September")
i <- kd_rq_indices(df, lower, upper)
print(i)
#> [1] 4 5 6 9 10 11 12
df[i, ]
#> a b c
#> 11 0.1068831 1 November
#> 6 0.3097297 2 February
#> 2 0.1290908 1 March
#> 9 0.8758608 1 December
#> 12 0.5775836 1 August
#> 7 0.5490220 2 January
#> 4 0.7108940 2 October
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.