knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(nngeo)

This document introduces the `nngeo`

package. The `nngeo`

package includes functions for spatial join of layers based on *k-nearest neighbor* relation between features. The functions work with spatial layer object defined in package `sf`

, namely classes `sfc`

and `sf`

.

CRAN version:

install.packages("remotes") remotes::install_github("michaeldorman/nngeo")

GitHub version:

install.packages("nngeo")

The `nngeo`

package comes with three sample datasets:

`cities`

`towns`

`water`

data(cities) data(towns) data(water)

The `cities`

layer is a **point** layer representing the location of the three largest cities in Israel.

cities

The `towns`

layer is another **point** layer, with the location of all large towns in Israel, compiled from a different data source:

towns

The `water`

layer is an example of a **polygonal** layer. This layer contains four polygons of water bodies in Israel.

water

Figure \ref{fig:layers} shows the spatial configuration of the `cities`

, `towns`

and `water`

layers.

plot(st_geometry(water), col = "lightblue") plot(st_geometry(towns), col = "grey", pch = 1, add = TRUE) plot(st_geometry(cities), col = "red", pch = 1, add = TRUE)

```r, \texttt{towns} and \texttt{cities} layers'} opar = par(mar = rep(0, 4)) plot(st_geometry(water), col = "lightblue") plot(st_geometry(towns), col = "grey", pch = 1, add = TRUE) plot(st_geometry(cities), col = "red", pch = 1, add = TRUE) par(opar)

# Usage examples ## The `st_nn` function The main function in the `nngeo` package is `st_nn`. The `st_nn` function accepts two layers, `x` and `y`, and returns a list with the same number of elements as `x` features. Each list element `i` is an integer vector with all indices `j` for which `x[i]` and `y[j]` are **nearest neighbors**. For example, the following expression finds which feature in `towns` is the nearest neighbor to each feature in `cities`: ```r nn = st_nn(cities, towns, progress = FALSE) nn

This output tells us that `towns[70, ]`

is the nearest among the `r nrow(towns)`

features of `towns`

to `cities[1, ]`

, `towns[145, ]`

is the nearest to `cities[2, ]`

, and `towns[59, ]`

is the nearest to `cities[3, ]`

.

`st_connect`

functionThe resulting nearest neighbor matches can be visualized using the `st_connect`

function. This function builds a line layer connecting features from two layers `x`

and `y`

based on the relations defined in a list such the one returned by `st_nn`

:

l = st_connect(cities, towns, ids = nn) l

Plotting the line layer `l`

gives a visual demonstration of the nearest neighbors match, as shown in Figure \ref{fig:st_connect}.

plot(st_geometry(l)) plot(st_geometry(towns), col = "darkgrey", add = TRUE) plot(st_geometry(cities), col = "red", add = TRUE) text(st_coordinates(cities)[, 1], st_coordinates(cities)[, 2], 1:3, col = "red", pos = 4)

```r (in red) and \texttt{towns} (in grey)"} opar = par(mar = rep(0.5, 4)) plot(st_geometry(l)) plot(st_geometry(towns), col = "darkgrey", add = TRUE) plot(st_geometry(cities), col = "red", add = TRUE) text(st_coordinates(cities)[, 1], st_coordinates(cities)[, 2], 1:3, col = "red", pos = 4) par(opar)

## Dense matrix representation The `st_nn` can also return the complete logical matrix indicating whether each feature in `x` is a neighbor of `y`. To get the dense matrix, instead of a list, use `sparse=FALSE`. ```r nn = st_nn(cities, towns[1:5, ], sparse = FALSE, progress = FALSE) nn

`k>0`

It is also possible to return any **k-nearest** neighbors, rather than just one. For example, setting `k=2`

returns both the 1^st^ and 2^nd^ nearest neighbors:

nn = st_nn(cities, towns, k = 2, progress = FALSE) nn

Here is another example, finding the 10-nearest neighbor `towns`

features for each `cities`

feature:

x = st_nn(cities, towns, k = 10) l = st_connect(cities, towns, ids = x)

The result is visualized in Figure \ref{fig:cities_towns}.

plot(st_geometry(l)) plot(st_geometry(cities), col = "red", add = TRUE) plot(st_geometry(towns), col = "darkgrey", add = TRUE)

```r features from each \texttt{cities} feature"} opar = par(mar = rep(1, 4)) plot(st_geometry(l)) plot(st_geometry(cities), col = "red", add = TRUE) plot(st_geometry(towns), col = "darkgrey", add = TRUE) par(opar)

## Distance to nearest neighbors Using `returnDist=TRUE` the distances `list` is also returned, in addition the the neighbor matches, with both components now comprising a `list`: ```r nn = st_nn(cities, towns, k = 1, returnDist = TRUE, progress = FALSE) nn

Finally, the search for nearest neighbors can be limited to a **search radius** using `maxdist`

. In the following example, the search radius is set to 2,000 meters (2 kilometers). Note that no neighbors are found within the search radius for `cities[3, ]`

, therefore the third list element is a zero-length vector of indices:

nn = st_nn(cities, towns, k = 1, maxdist = 2000, progress = FALSE) nn

The `st_nn`

function can also be used as a **predicate function** when performing spatial join with `sf::st_join`

. For example, the following expression spatially joins the two nearest `towns`

features to each `cities`

features, using a search radius of 5 km:

st_join(cities, towns, join = st_nn, k = 2, maxdist = 5000, progress = FALSE)

Sometimes it's necessary to bind the distances to the joined features in the resulting layer, to have more detailed information about the distance to nearest features. For example, suppose we join the nearest `towns`

feature to `cities`

, as shown above:

cities1 = st_join(cities, towns, join = st_nn, k = 1, progress = FALSE) cities1

As shown above, the distances can be calculated using the `returnDist=TRUE`

option, then binded to the above join result:

# Calculate distances n = st_nn(cities, towns, k = 1, returnDist = TRUE, progress = FALSE) dists = sapply(n[[2]], "[", 1) dists # Bind distances cities1$dist = dists cities1

In the above workflow, we actually ran the same nearest neighbor search *twice*, once in `st_join`

and more time to get the distances.

Another more verbose approach can be used in case the computation time is prohibitive. Here, we calculate the nearest neighbor indices and distances just once, then use them to construct the "joined" table with the distances:

# Get indices & distances n = st_nn(cities, towns, k = 1, returnDist = TRUE, progress = FALSE) ids = sapply(n[[1]], "[", 1) dists = sapply(n[[2]], "[", 1) # Join cities1 = data.frame(cities, st_drop_geometry(towns)[ids, , drop = FALSE]) cities1 = st_sf(cities1) # Add distances cities1$dist = dists cities1

Nearest neighbor search also works for non-point layers. The following code section finds the 20-nearest `towns`

features for each water body in `water[-1, ]`

.

nn = st_nn(water[-1, ], towns, k = 20, progress = FALSE)

Again, we can calculate the respective lines for the above result using `st_connect`

. Since one of the inputs is line/polygon, we need to specify a sampling distance `dist`

, which sets the resolution of connecting points on the shape exterior boundary.

l = st_connect(water[-1, ], towns, ids = nn, dist = 100)

The result is visualized in Figure \ref{fig:water_towns}.

plot(st_geometry(water[-1, ]), col = "lightblue", border = "grey") plot(st_geometry(towns), col = "darkgrey", add = TRUE) plot(st_geometry(l), col = "red", add = TRUE)

```
r features from each \\texttt{water} polygon"}
opar = par(mar = rep(0, 4))
plot(st_geometry(water[-1, ]), col = "lightblue", border = "grey")
plot(st_geometry(towns), col = "darkgrey", add = TRUE)
plot(st_geometry(l), col = "red", add = TRUE)
par(opar)
```

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.