k_occur | R Documentation |
k_occur
returns a vector of the k-occurrences of a nearest neighbor graph
as defined by Radovanovic and co-workers (2010). The k-occurrence of an
object is the number of times it occurs among the k-nearest neighbors of
objects in a dataset.
k_occur(idx, k = NULL, include_self = TRUE)
idx |
integer matrix containing the nearest neighbor indices, integers
labeled starting at 1. Note that the integer labels do not have to
refer to the rows of |
k |
The number of closest neighbors to use. Must be between 1 and the
number of columns in |
include_self |
logical indicating whether the label |
The k-occurrence can take values between 0 and the size of the dataset. The larger the k-occurrence for an object, the more "popular" it is. Very large values of the k-occurrence (much larger than k) indicates that an object is a "hub" and also implies the existence of "anti-hubs": objects that never appear as k-nearest neighbors of other objects.
The presence of hubs can reduce the accuracy of nearest-neighbor descent and other approximate nearest neighbor algorithms in terms of retrieving the exact k-nearest neighbors. However the appearance of hubs can still be detected in these approximate results, so calculating the k-occurrences for the output of nearest neighbor descent is a useful diagnostic step.
a vector of length max(idx)
, containing the number of times an
object in idx
was found in the nearest neighbor list of the objects
represented by the row indices of idx
.
Radovanovic, M., Nanopoulos, A., & Ivanovic, M. (2010). Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research, 11, 2487-2531. https://www.jmlr.org/papers/v11/radovanovic10a.html
Bratic, B., Houle, M. E., Kurbalija, V., Oria, V., & Radovanovic, M. (2019). The Influence of Hubness on NN-Descent. International Journal on Artificial Intelligence Tools, 28(06), 1960002. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1142/S0218213019600029")}
iris_nbrs <- brute_force_knn(iris, k = 15)
iris_ko <- k_occur(iris_nbrs$idx)
# items 42 and 107 are not in 15 nearest neighbors of any other members of
# iris
which(iris_ko == 1) # they are only their own nearest neighbor
max(iris_ko) # most "popular" item appears on 29 15-nearest neighbor lists
which(iris_ko == max(iris_ko)) # it's iris item 64
# with k = 15, a maximum k-occurrence = 29 ~= 1.9 * k, which is not a cause
# for concern
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.