Getting started with kdtools

can_run = require(kdtools) && kdtools::has_cxx17()
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = can_run
)
if (has_cxx17()) {
  message("kdtools package not available, code will not be evaluated")
} else {
  message("kdtools needs C++17 for full functionality, code will not be evaluated")
}

There are now two options for using kdtools, either on a C++ vector of arrays (arrayvec object) or natively on a data frame. Sorting on arrayvec objects is fast. Passing a data frame is slower, but automatically supports mixed types.

Data Frame Interface

Step 1. Sort data frame

When working with a data frame, you can specify which columns to use and the order of inclusion using the cols argument. Omitting the cols argument uses all columns in order.

# sort by weight, miles-per-gallon and displacement
mtcars_sorted <- kd_sort(mtcars, cols = ~ wt + mpg + disp);
head(mtcars_sorted, 3)
tail(mtcars_sorted, 3)
Step 2. Search sorted data frame
lower <- c(2.5, 17, 120)
upper <- c(3.6, 22, 330)
kd_range_query(mtcars_sorted, lower, upper, cols = ~ wt + mpg + disp)
kd_nearest_neighbors(mtcars_sorted, lower, 2, cols = ~ wt + mpg + disp)

Arrayvec Interface

The kdtools package can be used to search for multidimensional points in a boxed region and find nearest neighbors in 1 to 9 dimensions. The package uses binary search on a sorted sequence of values. The current package is limited to matrices of real values. If you are interested in using string or mixed types in different dimensions, see the methods vignette.

Using kdtools is straightforward. There are four steps:

Step 1. Convert your matrix of values into a arrayvec object
library(kdtools)
x = matrix(runif(3e3), nc = 3)
y = matrix_to_tuples(x)
y[1:3, c(1, 3)]

The arrayvec object can be manipulated as if it were a matrix.

Step 2. Sort the data
kd_sort(y, inplace = TRUE, parallel = TRUE)
Step 3. Search the data
rq = kd_range_query(y, c(0, 0, 0), c(1/4, 1/4, 1/4)); rq
i = kd_nearest_neighbor(y, c(0, 0, 0)); y[i, ]
nns = kd_nearest_neighbors(y, c(0, 0, 0), 100); nns
nni = kd_nn_indices(y, c(0, 0, 0), 10); nni

The kd_nearest_neighbor and kd_nn_indices functions return row-indices. The other functions return arrayvec objects.

Step 4. Convert back to a matrix for use in R
head(tuples_to_matrix(rq))
head(tuples_to_matrix(nns))

If you pass a matrix instead of an arrayvec object to any of the functions, it will be converted to an arrayvec object internally and results will be returned as matrices. This is slower and provided for convenience.



Try the kdtools package in your browser

Any scripts or data that you put into this service are public.

kdtools documentation built on Oct. 8, 2021, 9:07 a.m.