k_occur | R Documentation |

`k_occur`

returns a vector of the k-occurrences of a nearest neighbor graph
as defined by Radovanovic and co-workers (2010). The k-occurrence of an
object is the number of times it occurs among the k-nearest neighbors of
objects in a dataset.

```
k_occur(idx, k = NULL, include_self = TRUE)
```

`idx` |
integer matrix containing the nearest neighbor indices, integers
labeled starting at 1. Note that the integer labels do |

`k` |
The number of closest neighbors to use. Must be between 1 and the
number of columns in |

`include_self` |
logical indicating whether the label |

The k-occurrence can take values between 0 and the size of the dataset. The larger the k-occurrence for an object, the more "popular" it is. Very large values of the k-occurrence (much larger than k) indicates that an object is a "hub" and also implies the existence of "anti-hubs": objects that never appear as k-nearest neighbors of other objects.

The presence of hubs can reduce the accuracy of nearest-neighbor descent and other approximate nearest neighbor algorithms in terms of retrieving the exact k-nearest neighbors. However the appearance of hubs can still be detected in these approximate results, so calculating the k-occurrences for the output of nearest neighbor descent is a useful diagnostic step.

a vector of length `max(idx)`

, containing the number of times an
object in `idx`

was found in the nearest neighbor list of the objects
represented by the row indices of `idx`

.

Radovanovic, M., Nanopoulos, A., & Ivanovic, M. (2010).
Hubs in space: Popular nearest neighbors in high-dimensional data.
*Journal of Machine Learning Research*, *11*, 2487-2531.
https://www.jmlr.org/papers/v11/radovanovic10a.html

Bratic, B., Houle, M. E., Kurbalija, V., Oria, V., & Radovanovic, M. (2019).
The Influence of Hubness on NN-Descent.
*International Journal on Artificial Intelligence Tools*, *28*(06), 1960002.
\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1142/S0218213019600029")}

```
iris_nbrs <- brute_force_knn(iris, k = 15)
iris_ko <- k_occur(iris_nbrs$idx)
# items 42 and 107 are not in 15 nearest neighbors of any other members of
# iris
which(iris_ko == 1) # they are only their own nearest neighbor
max(iris_ko) # most "popular" item appears on 29 15-nearest neighbor lists
which(iris_ko == max(iris_ko)) # it's iris item 64
# with k = 15, a maximum k-occurrence = 29 ~= 1.9 * k, which is not a cause
# for concern
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.