BiocNeighbors-ties: Handling tied distances
In LTLA/BiocNeighbors: Nearest Neighbor Detection for Bioconductor Packages

BiocNeighbors-ties

R Documentation

Handling tied distances

Description

Interpreting the warnings when distances are tied in an exact nearest neighbor (NN) search.

The problem of ties

The most obvious problem with ties is that it may affect the identity of the reported neighbors. The various NN search functions will return a constant number of neighbors for each data point. If the kth neighbor is tied with the k+1th neighbor, this requires an arbitrary decision about which data point to retain in the NN set. A milder issue is that the order of the neighbors within the set is arbitrary, which may be important for certain algorithms.

As such, a warning will be raised if tied distances are detected among the k+1 NNs for any of the exact NN search methods. We only consider exact ties at double precision - previous versions of this package would account for numerical imprecision, but this is no longer the case. No warning is given for the approximate methods as their use already implies that a certain degree of inaccuracy is acceptable.

Interaction with random seeds

In general, the exact NN search algorithms in this package are fully deterministic despite the use of stochastic steps during index construction. The only exception occurs when there are tied distances to neighbors, at which point the order and/or identity of the k-nearest neighboring points is not well-defined. This is because, in the presence of ties, the output will depend on the ordering of points in the constructed index from buildKmknn or buildVptree.

Users should set the seed to guarantee consistent (albeit arbitrary) results across different runs of the function. However, note that the exact selection of tied points depends on the numerical precision of the system. Thus, even after setting a seed, there is no guarantee that the results will be reproducible across machines (especially Windows)!

Turning off the warnings

It may ocassionally be appropriate to disable the warnings by setting warn.ties=FALSE. The most obvious scenario is when get.index=FALSE, i.e., we are only interested in the distances to the neighbors. In such cases, the presence of ties does not matter as changes to the identity of tied neighbors do not affect the returned distances (which, for ties, are equal by definition). Similarly, if the seed is set prior to the search, the warnings are unnecessary as the output is fully deterministic.

Author(s)

Aaron Lun

Examples

vals <- matrix(0, nrow=10, ncol=20)
out <- findKmknn(vals, k=5)

LTLA/BiocNeighbors documentation built on Jan. 14, 2024, 9:46 p.m.

LTLA/BiocNeighbors index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

LTLA/BiocNeighbors
Nearest Neighbor Detection for Bioconductor Packages

BiocNeighbors-ties: Handling tied distances
In LTLA/BiocNeighbors: Nearest Neighbor Detection for Bioconductor Packages

Handling tied distances

Description

The problem of ties

Interaction with random seeds

Turning off the warnings

Author(s)

See Also

Examples

Related to BiocNeighbors-ties in LTLA/BiocNeighbors...

R Package Documentation

Browse R Packages

We want your feedback!

LTLA/BiocNeighbors Nearest Neighbor Detection for Bioconductor Packages

BiocNeighbors-ties: Handling tied distances In LTLA/BiocNeighbors: Nearest Neighbor Detection for Bioconductor Packages

Handling tied distances

Description

The problem of ties

Interaction with random seeds

Turning off the warnings

Author(s)

See Also

Examples

Related to BiocNeighbors-ties in LTLA/BiocNeighbors...

R Package Documentation

Browse R Packages

We want your feedback!

LTLA/BiocNeighbors
Nearest Neighbor Detection for Bioconductor Packages

BiocNeighbors-ties: Handling tied distances
In LTLA/BiocNeighbors: Nearest Neighbor Detection for Bioconductor Packages