knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(nngeo)
This document introduces the nngeo
package. The nngeo
package includes functions for spatial join of layers based on k-nearest neighbor relation between features. The functions work with spatial layer object defined in package sf
, namely classes sfc
and sf
.
CRAN version:
install.packages("remotes") remotes::install_github("michaeldorman/nngeo")
GitHub version:
install.packages("nngeo")
The nngeo
package comes with three sample datasets:
cities
towns
water
data(cities) data(towns) data(water)
The cities
layer is a point layer representing the location of the three largest cities in Israel.
cities
The towns
layer is another point layer, with the location of all large towns in Israel, compiled from a different data source:
towns
The water
layer is an example of a polygonal layer. This layer contains four polygons of water bodies in Israel.
water
Figure \ref{fig:layers} shows the spatial configuration of the cities
, towns
and water
layers.
plot(st_geometry(water), col = "lightblue") plot(st_geometry(towns), col = "grey", pch = 1, add = TRUE) plot(st_geometry(cities), col = "red", pch = 1, add = TRUE)
```r, \texttt{towns} and \texttt{cities} layers'} opar = par(mar = rep(0, 4)) plot(st_geometry(water), col = "lightblue") plot(st_geometry(towns), col = "grey", pch = 1, add = TRUE) plot(st_geometry(cities), col = "red", pch = 1, add = TRUE) par(opar)
# Usage examples ## The `st_nn` function The main function in the `nngeo` package is `st_nn`. The `st_nn` function accepts two layers, `x` and `y`, and returns a list with the same number of elements as `x` features. Each list element `i` is an integer vector with all indices `j` for which `x[i]` and `y[j]` are **nearest neighbors**. For example, the following expression finds which feature in `towns` is the nearest neighbor to each feature in `cities`: ```r nn = st_nn(cities, towns, progress = FALSE) nn
This output tells us that towns[70, ]
is the nearest among the r nrow(towns)
features of towns
to cities[1, ]
, towns[145, ]
is the nearest to cities[2, ]
, and towns[59, ]
is the nearest to cities[3, ]
.
st_connect
functionThe resulting nearest neighbor matches can be visualized using the st_connect
function. This function builds a line layer connecting features from two layers x
and y
based on the relations defined in a list such the one returned by st_nn
:
l = st_connect(cities, towns, ids = nn) l
Plotting the line layer l
gives a visual demonstration of the nearest neighbors match, as shown in Figure \ref{fig:st_connect}.
plot(st_geometry(l)) plot(st_geometry(towns), col = "darkgrey", add = TRUE) plot(st_geometry(cities), col = "red", add = TRUE) text(st_coordinates(cities)[, 1], st_coordinates(cities)[, 2], 1:3, col = "red", pos = 4)
```r (in red) and \texttt{towns} (in grey)"} opar = par(mar = rep(0.5, 4)) plot(st_geometry(l)) plot(st_geometry(towns), col = "darkgrey", add = TRUE) plot(st_geometry(cities), col = "red", add = TRUE) text(st_coordinates(cities)[, 1], st_coordinates(cities)[, 2], 1:3, col = "red", pos = 4) par(opar)
## Dense matrix representation The `st_nn` can also return the complete logical matrix indicating whether each feature in `x` is a neighbor of `y`. To get the dense matrix, instead of a list, use `sparse=FALSE`. ```r nn = st_nn(cities, towns[1:5, ], sparse = FALSE, progress = FALSE) nn
k>0
It is also possible to return any k-nearest neighbors, rather than just one. For example, setting k=2
returns both the 1^st^ and 2^nd^ nearest neighbors:
nn = st_nn(cities, towns, k = 2, progress = FALSE) nn
Here is another example, finding the 10-nearest neighbor towns
features for each cities
feature:
x = st_nn(cities, towns, k = 10) l = st_connect(cities, towns, ids = x)
The result is visualized in Figure \ref{fig:cities_towns}.
plot(st_geometry(l)) plot(st_geometry(cities), col = "red", add = TRUE) plot(st_geometry(towns), col = "darkgrey", add = TRUE)
```r features from each \texttt{cities} feature"} opar = par(mar = rep(1, 4)) plot(st_geometry(l)) plot(st_geometry(cities), col = "red", add = TRUE) plot(st_geometry(towns), col = "darkgrey", add = TRUE) par(opar)
## Distance to nearest neighbors Using `returnDist=TRUE` the distances `list` is also returned, in addition the the neighbor matches, with both components now comprising a `list`: ```r nn = st_nn(cities, towns, k = 1, returnDist = TRUE, progress = FALSE) nn
Finally, the search for nearest neighbors can be limited to a search radius using maxdist
. In the following example, the search radius is set to 2,000 meters (2 kilometers). Note that no neighbors are found within the search radius for cities[3, ]
, therefore the third list element is a zero-length vector of indices:
nn = st_nn(cities, towns, k = 1, maxdist = 2000, progress = FALSE) nn
The st_nn
function can also be used as a predicate function when performing spatial join with sf::st_join
. For example, the following expression spatially joins the two nearest towns
features to each cities
features, using a search radius of 5 km:
st_join(cities, towns, join = st_nn, k = 2, maxdist = 5000, progress = FALSE)
Sometimes it's necessary to bind the distances to the joined features in the resulting layer, to have more detailed information about the distance to nearest features. For example, suppose we join the nearest towns
feature to cities
, as shown above:
cities1 = st_join(cities, towns, join = st_nn, k = 1, progress = FALSE) cities1
As shown above, the distances can be calculated using the returnDist=TRUE
option, then binded to the above join result:
# Calculate distances n = st_nn(cities, towns, k = 1, returnDist = TRUE, progress = FALSE) dists = sapply(n[[2]], "[", 1) dists # Bind distances cities1$dist = dists cities1
In the above workflow, we actually ran the same nearest neighbor search twice, once in st_join
and more time to get the distances.
Another more verbose approach can be used in case the computation time is prohibitive. Here, we calculate the nearest neighbor indices and distances just once, then use them to construct the "joined" table with the distances:
# Get indices & distances n = st_nn(cities, towns, k = 1, returnDist = TRUE, progress = FALSE) ids = sapply(n[[1]], "[", 1) dists = sapply(n[[2]], "[", 1) # Join cities1 = data.frame(cities, st_drop_geometry(towns)[ids, , drop = FALSE]) cities1 = st_sf(cities1) # Add distances cities1$dist = dists cities1
Nearest neighbor search also works for non-point layers. The following code section finds the 20-nearest towns
features for each water body in water[-1, ]
.
nn = st_nn(water[-1, ], towns, k = 20, progress = FALSE)
Again, we can calculate the respective lines for the above result using st_connect
. Since one of the inputs is line/polygon, we need to specify a sampling distance dist
, which sets the resolution of connecting points on the shape exterior boundary.
l = st_connect(water[-1, ], towns, ids = nn, dist = 100)
The result is visualized in Figure \ref{fig:water_towns}.
plot(st_geometry(water[-1, ]), col = "lightblue", border = "grey") plot(st_geometry(towns), col = "darkgrey", add = TRUE) plot(st_geometry(l), col = "red", add = TRUE)
r features from each \\texttt{water} polygon"}
opar = par(mar = rep(0, 4))
plot(st_geometry(water[-1, ]), col = "lightblue", border = "grey")
plot(st_geometry(towns), col = "darkgrey", add = TRUE)
plot(st_geometry(l), col = "red", add = TRUE)
par(opar)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.