subsample: Nearest neighbor subsampling
In SPlit: Split a Dataset for Training and Testing

View source: R/RcppExports.R

subsample

R Documentation

Nearest neighbor subsampling

Description

subsample() finds the nearest data points in a dataset to a given set of points as described in Joseph and Vakayil (2021). It uses an efficient kd-tree based algorithm that allows for lazy deletion of a data point from the kd-tree, thereby avoiding the need to rebuild the tree after each query. Please see Blanco and Rai (2014) for details.

Usage

	subsample(data, points)

Arguments

`data`	The dataset; should be numeric.
`points`	The set of query points of the same dimension as the dataset.

Value

Indices of the nearest neighbors in the dataset.

References

Blanco, J. L. & Rai, P. K. (2014). nanoflann: a C++ header-only fork of FLANN, a library for nearest neighbor (NN) with kd-trees. https://github.com/jlblancoc/nanoflann.

Joseph, V. R., & Vakayil, A. (2021). SPlit: An Optimal Method for Data Splitting. Technometrics, 1-11. doi:10.1080/00401706.2021.1921037.

SPlit documentation built on Jan. 18, 2026, 1:06 a.m.