In michaeldorman/nngeo: k-Nearest Neighbor Join for Spatial Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(nngeo)

Introduction

Package purpose

This document introduces the nngeo package. The nngeo package includes functions for spatial join of layers based on k-nearest neighbor relation between features. The functions work with spatial layer object defined in package sf, namely classes sfc and sf.

Installation

CRAN version:

install.packages("remotes")
remotes::install_github("michaeldorman/nngeo")

GitHub version:

install.packages("nngeo")

Sample data

The nngeo package comes with three sample datasets:

cities
towns
water

data(cities)
data(towns)
data(water)

The cities layer is a point layer representing the location of the three largest cities in Israel.

cities

The towns layer is another point layer, with the location of all large towns in Israel, compiled from a different data source:

towns

The water layer is an example of a polygonal layer. This layer contains four polygons of water bodies in Israel.

water

Figure \ref{fig:layers} shows the spatial configuration of the cities, towns and water layers.

plot(st_geometry(water), col = "lightblue")
plot(st_geometry(towns), col = "grey", pch = 1, add = TRUE)
plot(st_geometry(cities), col = "red", pch = 1, add = TRUE)

```r, \texttt{towns} and \texttt{cities} layers'} opar = par(mar = rep(0, 4)) plot(st_geometry(water), col = "lightblue") plot(st_geometry(towns), col = "grey", pch = 1, add = TRUE) plot(st_geometry(cities), col = "red", pch = 1, add = TRUE) par(opar)

# Usage examples

## The `st_nn` function

The main function in the `nngeo` package is `st_nn`. The `st_nn` function accepts two layers, `x` and `y`, and returns a list with the same number of elements as `x` features. Each list element `i` is an integer vector with all indices `j` for which `x[i]` and `y[j]` are **nearest neighbors**. 

For example, the following expression finds which feature in `towns` is the nearest neighbor to each feature in `cities`:

```r
nn = st_nn(cities, towns, progress = FALSE)
nn

This output tells us that towns[70, ] is the nearest among the r nrow(towns) features of towns to cities[1, ], towns[145, ] is the nearest to cities[2, ], and towns[59, ] is the nearest to cities[3, ].

The `st_connect` function

The resulting nearest neighbor matches can be visualized using the st_connect function. This function builds a line layer connecting features from two layers x and y based on the relations defined in a list such the one returned by st_nn:

l = st_connect(cities, towns, ids = nn)
l

Plotting the line layer l gives a visual demonstration of the nearest neighbors match, as shown in Figure \ref{fig:st_connect}.

plot(st_geometry(l))
plot(st_geometry(towns), col = "darkgrey", add = TRUE)
plot(st_geometry(cities), col = "red", add = TRUE)
text(st_coordinates(cities)[, 1], st_coordinates(cities)[, 2], 1:3, col = "red", pos = 4)

```r (in red) and \texttt{towns} (in grey)"} opar = par(mar = rep(0.5, 4)) plot(st_geometry(l)) plot(st_geometry(towns), col = "darkgrey", add = TRUE) plot(st_geometry(cities), col = "red", add = TRUE) text(st_coordinates(cities)[, 1], st_coordinates(cities)[, 2], 1:3, col = "red", pos = 4) par(opar)

## Dense matrix representation

The `st_nn` can also return the complete logical matrix indicating whether each feature in `x` is a neighbor of `y`. To get the dense matrix, instead of a list, use `sparse=FALSE`.

```r
nn = st_nn(cities, towns[1:5, ], sparse = FALSE, progress = FALSE)
nn

k-Nearest neighbors where `k>0`

It is also possible to return any k-nearest neighbors, rather than just one. For example, setting k=2 returns both the 1^st^ and 2^nd^ nearest neighbors:

nn = st_nn(cities, towns, k = 2, progress = FALSE)
nn

Here is another example, finding the 10-nearest neighbor towns features for each cities feature:

x = st_nn(cities, towns, k = 10)
l = st_connect(cities, towns, ids = x)

The result is visualized in Figure \ref{fig:cities_towns}.

plot(st_geometry(l))
plot(st_geometry(cities), col = "red", add = TRUE)
plot(st_geometry(towns), col = "darkgrey", add = TRUE)

```r features from each \texttt{cities} feature"} opar = par(mar = rep(1, 4)) plot(st_geometry(l)) plot(st_geometry(cities), col = "red", add = TRUE) plot(st_geometry(towns), col = "darkgrey", add = TRUE) par(opar)

## Distance to nearest neighbors

Using `returnDist=TRUE` the distances `list` is also returned, in addition the the neighbor matches, with both components now comprising a `list`:

```r
nn = st_nn(cities, towns, k = 1, returnDist = TRUE, progress = FALSE)
nn

Search radius

Finally, the search for nearest neighbors can be limited to a search radius using maxdist. In the following example, the search radius is set to 2,000 meters (2 kilometers). Note that no neighbors are found within the search radius for cities[3, ], therefore the third list element is a zero-length vector of indices:

nn = st_nn(cities, towns, k = 1, maxdist = 2000, progress = FALSE)
nn

Spatial join

The st_nn function can also be used as a predicate function when performing spatial join with sf::st_join. For example, the following expression spatially joins the two nearest towns features to each cities features, using a search radius of 5 km:

st_join(cities, towns, join = st_nn, k = 2, maxdist = 5000, progress = FALSE)

Binding distances to join result

Sometimes it's necessary to bind the distances to the joined features in the resulting layer, to have more detailed information about the distance to nearest features. For example, suppose we join the nearest towns feature to cities, as shown above:

cities1 = st_join(cities, towns, join = st_nn, k = 1, progress = FALSE)
cities1

As shown above, the distances can be calculated using the returnDist=TRUE option, then binded to the above join result:

# Calculate distances
n = st_nn(cities, towns, k = 1, returnDist = TRUE, progress = FALSE)
dists = sapply(n[[2]], "[", 1)
dists

# Bind distances
cities1$dist = dists
cities1

In the above workflow, we actually ran the same nearest neighbor search twice, once in st_join and more time to get the distances.

Another more verbose approach can be used in case the computation time is prohibitive. Here, we calculate the nearest neighbor indices and distances just once, then use them to construct the "joined" table with the distances:

# Get indices & distances
n = st_nn(cities, towns, k = 1, returnDist = TRUE, progress = FALSE)
ids = sapply(n[[1]], "[", 1)
dists = sapply(n[[2]], "[", 1)

# Join
cities1 = data.frame(cities, st_drop_geometry(towns)[ids, , drop = FALSE])
cities1 = st_sf(cities1)

# Add distances
cities1$dist = dists
cities1

Polygons

Nearest neighbor search also works for non-point layers. The following code section finds the 20-nearest towns features for each water body in water[-1, ].

nn = st_nn(water[-1, ], towns, k = 20, progress = FALSE)

Again, we can calculate the respective lines for the above result using st_connect. Since one of the inputs is line/polygon, we need to specify a sampling distance dist, which sets the resolution of connecting points on the shape exterior boundary.

l = st_connect(water[-1, ], towns, ids = nn, dist = 100)

The result is visualized in Figure \ref{fig:water_towns}.

plot(st_geometry(water[-1, ]), col = "lightblue", border = "grey")
plot(st_geometry(towns), col = "darkgrey", add = TRUE)
plot(st_geometry(l), col = "red", add = TRUE)

r features from each \\texttt{water} polygon"} opar = par(mar = rep(0, 4)) plot(st_geometry(water[-1, ]), col = "lightblue", border = "grey") plot(st_geometry(towns), col = "darkgrey", add = TRUE) plot(st_geometry(l), col = "red", add = TRUE) par(opar)

michaeldorman/nngeo documentation built on April 19, 2024, 12:04 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com