# Introduction to package nngeo In nngeo: k-Nearest Neighbor Join for Spatial Data

knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(nngeo)  # Introduction ## Package purpose This document introduces the nngeo package. The nngeo package includes functions for spatial join of layers based on k-nearest neighbor relation between features. The functions work with spatial layer object defined in package sf, namely classes sfc and sf. ## Installation CRAN version: install.packages("remotes") remotes::install_github("michaeldorman/nngeo")  GitHub version: install.packages("nngeo")  ## Sample data The nngeo package comes with three sample datasets: • cities • towns • water data(cities) data(towns) data(water)  The cities layer is a point layer representing the location of the three largest cities in Israel. cities  The towns layer is another point layer, with the location of all large towns in Israel, compiled from a different data source: towns  The water layer is an example of a polygonal layer. This layer contains four polygons of water bodies in Israel. water  Figure \ref{fig:layers} shows the spatial configuration of the cities, towns and water layers. plot(st_geometry(water), col = "lightblue") plot(st_geometry(towns), col = "grey", pch = 1, add = TRUE) plot(st_geometry(cities), col = "red", pch = 1, add = TRUE)  r, \texttt{towns} and \texttt{cities} layers'} opar = par(mar = rep(0, 4)) plot(st_geometry(water), col = "lightblue") plot(st_geometry(towns), col = "grey", pch = 1, add = TRUE) plot(st_geometry(cities), col = "red", pch = 1, add = TRUE) par(opar) # Usage examples ## The st_nn function The main function in the nngeo package is st_nn. The st_nn function accepts two layers, x and y, and returns a list with the same number of elements as x features. Each list element i is an integer vector with all indices j for which x[i] and y[j] are **nearest neighbors**. For example, the following expression finds which feature in towns is the nearest neighbor to each feature in cities: r nn = st_nn(cities, towns, progress = FALSE) nn  This output tells us that towns[70, ] is the nearest among the r nrow(towns) features of towns to cities[1, ], towns[145, ] is the nearest to cities[2, ], and towns[59, ] is the nearest to cities[3, ]. ## The st_connect function The resulting nearest neighbor matches can be visualized using the st_connect function. This function builds a line layer connecting features from two layers x and y based on the relations defined in a list such the one returned by st_nn: l = st_connect(cities, towns, ids = nn) l  Plotting the line layer l gives a visual demonstration of the nearest neighbors match, as shown in Figure \ref{fig:st_connect}. plot(st_geometry(l)) plot(st_geometry(towns), col = "darkgrey", add = TRUE) plot(st_geometry(cities), col = "red", add = TRUE) text(st_coordinates(cities)[, 1], st_coordinates(cities)[, 2], 1:3, col = "red", pos = 4)  r (in red) and \texttt{towns} (in grey)"} opar = par(mar = rep(0.5, 4)) plot(st_geometry(l)) plot(st_geometry(towns), col = "darkgrey", add = TRUE) plot(st_geometry(cities), col = "red", add = TRUE) text(st_coordinates(cities)[, 1], st_coordinates(cities)[, 2], 1:3, col = "red", pos = 4) par(opar) ## Dense matrix representation The st_nn can also return the complete logical matrix indicating whether each feature in x is a neighbor of y. To get the dense matrix, instead of a list, use sparse=FALSE. r nn = st_nn(cities, towns[1:5, ], sparse = FALSE, progress = FALSE) nn  ## k-Nearest neighbors where k>0 It is also possible to return any k-nearest neighbors, rather than just one. For example, setting k=2 returns both the 1^st^ and 2^nd^ nearest neighbors: nn = st_nn(cities, towns, k = 2, progress = FALSE) nn  Here is another example, finding the 10-nearest neighbor towns features for each cities feature: x = st_nn(cities, towns, k = 10) l = st_connect(cities, towns, ids = x)  The result is visualized in Figure \ref{fig:cities_towns}. plot(st_geometry(l)) plot(st_geometry(cities), col = "red", add = TRUE) plot(st_geometry(towns), col = "darkgrey", add = TRUE)  r features from each \texttt{cities} feature"} opar = par(mar = rep(1, 4)) plot(st_geometry(l)) plot(st_geometry(cities), col = "red", add = TRUE) plot(st_geometry(towns), col = "darkgrey", add = TRUE) par(opar) ## Distance to nearest neighbors Using returnDist=TRUE the distances list is also returned, in addition the the neighbor matches, with both components now comprising a list: r nn = st_nn(cities, towns, k = 1, returnDist = TRUE, progress = FALSE) nn  ## Search radius Finally, the search for nearest neighbors can be limited to a search radius using maxdist. In the following example, the search radius is set to 2,000 meters (2 kilometers). Note that no neighbors are found within the search radius for cities[3, ], therefore the third list element is a zero-length vector of indices: nn = st_nn(cities, towns, k = 1, maxdist = 2000, progress = FALSE) nn  ## Spatial join The st_nn function can also be used as a predicate function when performing spatial join with sf::st_join. For example, the following expression spatially joins the two nearest towns features to each cities features, using a search radius of 5 km: st_join(cities, towns, join = st_nn, k = 2, maxdist = 5000, progress = FALSE)  ## Binding distances to join result Sometimes it's necessary to bind the distances to the joined features in the resulting layer, to have more detailed information about the distance to nearest features. For example, suppose we join the nearest towns feature to cities, as shown above: cities1 = st_join(cities, towns, join = st_nn, k = 1, progress = FALSE) cities1  As shown above, the distances can be calculated using the returnDist=TRUE option, then binded to the above join result: # Calculate distances n = st_nn(cities, towns, k = 1, returnDist = TRUE, progress = FALSE) dists = sapply(n[[2]], "[", 1) dists # Bind distances cities1$dist = dists
cities1


In the above workflow, we actually ran the same nearest neighbor search twice, once in st_join and more time to get the distances.

Another more verbose approach can be used in case the computation time is prohibitive. Here, we calculate the nearest neighbor indices and distances just once, then use them to construct the "joined" table with the distances:

# Get indices & distances
n = st_nn(cities, towns, k = 1, returnDist = TRUE, progress = FALSE)
ids = sapply(n[[1]], "[", 1)
dists = sapply(n[[2]], "[", 1)

# Join
cities1 = data.frame(cities, st_drop_geometry(towns)[ids, , drop = FALSE])
cities1 = st_sf(cities1)

cities1\$dist = dists
cities1


## Polygons

Nearest neighbor search also works for non-point layers. The following code section finds the 20-nearest towns features for each water body in water[-1, ].

nn = st_nn(water[-1, ], towns, k = 20, progress = FALSE)


Again, we can calculate the respective lines for the above result using st_connect. Since one of the inputs is line/polygon, we need to specify a sampling distance dist, which sets the resolution of connecting points on the shape exterior boundary.

l = st_connect(water[-1, ], towns, ids = nn, dist = 100)


The result is visualized in Figure \ref{fig:water_towns}.

plot(st_geometry(water[-1, ]), col = "lightblue", border = "grey")
plot(st_geometry(towns), col = "darkgrey", add = TRUE)
plot(st_geometry(l), col = "red", add = TRUE)


r features from each \\texttt{water} polygon"} opar = par(mar = rep(0, 4)) plot(st_geometry(water[-1, ]), col = "lightblue", border = "grey") plot(st_geometry(towns), col = "darkgrey", add = TRUE) plot(st_geometry(l), col = "red", add = TRUE) par(opar)

## Try the nngeo package in your browser

Any scripts or data that you put into this service are public.

nngeo documentation built on Jan. 13, 2022, 5:10 p.m.