| k_occur | R Documentation |
k_occur returns a vector of the k-occurrences of a nearest neighbor graph
as defined by Radovanovic and co-workers (2010). The k-occurrence of an
object is the number of times it occurs among the k-nearest neighbors of
objects in a dataset.
k_occur(idx, k = NULL, include_self = TRUE)
idx |
integer matrix containing the nearest neighbor indices, integers
labeled starting at 1. Note that the integer labels do not have to
refer to the rows of |
k |
The number of closest neighbors to use. Must be between 1 and the
number of columns in |
include_self |
logical indicating whether the label |
The k-occurrence can take values between 0 and the size of the dataset. The larger the k-occurrence for an object, the more "popular" it is. Very large values of the k-occurrence (much larger than k) indicates that an object is a "hub" and also implies the existence of "anti-hubs": objects that never appear as k-nearest neighbors of other objects.
The presence of hubs can reduce the accuracy of nearest-neighbor descent and other approximate nearest neighbor algorithms in terms of retrieving the exact k-nearest neighbors. However the appearance of hubs can still be detected in these approximate results, so calculating the k-occurrences for the output of nearest neighbor descent is a useful diagnostic step.
a vector of length max(idx), containing the number of times an
object in idx was found in the nearest neighbor list of the objects
represented by the row indices of idx.
Radovanovic, M., Nanopoulos, A., & Ivanovic, M. (2010). Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research, 11, 2487-2531. https://www.jmlr.org/papers/v11/radovanovic10a.html
Bratic, B., Houle, M. E., Kurbalija, V., Oria, V., & Radovanovic, M. (2019). The Influence of Hubness on NN-Descent. International Journal on Artificial Intelligence Tools, 28(06), 1960002. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1142/S0218213019600029")}
iris_nbrs <- brute_force_knn(iris, k = 15)
iris_ko <- k_occur(iris_nbrs$idx)
# items 42 and 107 are not in 15 nearest neighbors of any other members of
# iris
which(iris_ko == 1) # they are only their own nearest neighbor
max(iris_ko) # most "popular" item appears on 29 15-nearest neighbor lists
which(iris_ko == max(iris_ko)) # it's iris item 64
# with k = 15, a maximum k-occurrence = 29 ~= 1.9 * k, which is not a cause
# for concern
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.