# Creating Neighbours

This vignette formed pp. 239--251 of the first edition of Bivand, R. S., Pebesma, E. and Gómez-Rubio V. (2008) Applied Spatial Data Analysis with R, Springer-Verlag, New York. It was retired from the second edition (2013) to accommodate material on other topics, and is made available in this form with the understanding of the publishers.

## Introduction

Creating spatial weights is a necessary step in using areal data, perhaps just to check that there is no remaining spatial patterning in residuals. The first step is to define which relationships between observations are to be given a non-zero weight, that is to choose the neighbour criterion to be used; the second is to assign weights to the identified neighbour links.

The 281 census tract data set for eight central New York State counties featured prominently in [@WallerGotway:2004] will be used in many of the examples, supplemented with tract boundaries derived from TIGER 1992 and distributed by SEDAC/CIESIN. The boundaries have been projected from geographical coordinates to UTM zone~18. This file is not identical with the boundaries used in the original source, but is very close and may be re-distributed, unlike the version used in the book. Starting from the census tract queen contiguities, where all touching polygons are neighbours, used in [@WallerGotway:2004] and provided as a DBF file on their website, a GAL format file has been created and read into R.

if (require(rgdal, quietly=TRUE)) {
} else {
require(maptools, quietly=TRUE)
}
library(spdep)


For the sake of simplicity in showing how to create neighbour objects, we work on a subset of the map consisting of the census tracts within Syracuse, although the same principles apply to the full data set. We retrieve the part of the neighbour list in Syracuse using the subset method.

Syracuse <- NY8[NY8$AREANAME == "Syracuse city",] Sy0_nb <- subset(NY_nb, NY8$AREANAME == "Syracuse city")
summary(Sy0_nb)


## Creating Contiguity Neighbours

We can create a copy of the same neighbours object for polygon contiguities using the poly2nb() function in spdep. It takes an object extending the SpatialPolygons class as its first argument, and using heuristics identifies polygons sharing boundary points as neighbours. It also has a snap= argument, to allow the shared boundary points to be a short distance from one another.

class(Syracuse)
Sy1_nb <- poly2nb(Syracuse)
isTRUE(all.equal(Sy0_nb, Sy1_nb, check.attributes=FALSE))


As we can see, creating the contiguity neighbours from the Syracuse object reproduces the neighbours from [@WallerGotway:2004]. Careful examination of the next figure shows, however, that the graph of neighbours is not planar, since some neighbour links cross each other. By default, the contiguity condition is met when at least one point on the boundary of one polygon is within the snap distance of at least one point of its neighbour. This relationship is given by the argument queen=TRUE by analogy with movements on a chessboard. So when three or more polygons meet at a single point, they all meet the contiguity condition, giving rise to crossed links. If queen=FALSE, at least two boundary points must be within the snap distance of each other, with the conventional name of a rook' relationship. The figure shows the crossed line differences that arise when polygons touch only at a single point, compared to the stricter rook criterion.

Sy2_nb <- poly2nb(Syracuse, queen=FALSE)
isTRUE(all.equal(Sy0_nb, Sy2_nb, check.attributes=FALSE))

oopar <- par(mfrow=c(1,2), mar=c(3,3,1,1)+0.1)
plot(Syracuse, border="grey60")
text(bbox(Syracuse)[1,1], bbox(Syracuse)[2,2], labels="a)", cex=0.8)
plot(Syracuse, border="grey60")
plot(diffnb(Sy0_nb, Sy2_nb, verbose=FALSE), coordinates(Syracuse),
text(bbox(Syracuse)[1,1], bbox(Syracuse)[2,2], labels="b)", cex=0.8)
par(oopar)


a: Queen-style census tract contiguities, Syracuse; b: Rook-style contiguity differences shown as thicker lines

If we have access to a GIS such as GRASS or ArcGIS, we can export the SpatialPolygonsDataFrame object and use the topology engine in the GIS to find contiguities in the graph of polygon edges - a shared edge will yield the same output as the rook relationship.

This procedure does, however, depend on the topology of the set of polygons being clean, which holds for this subset, but not for the full eight-county data set. Not infrequently, there are small artefacts, such as slivers where boundary lines intersect or diverge by distances that cannot be seen on plots, but which require intervention to keep the geometries and data correctly associated. When these geometrical artefacts are present, the topology is not clean, because unambiguous shared polygon boundaries cannot be found in all cases; artefacts typically arise when data collected for one purpose are combined with other data or used for another purpose. Topologies are usually cleaned in a GIS by snapping' vertices closer than a threshold distance together, removing artefacts -- for example, snapping across a river channel where the correct boundary is the median line but the input polygons stop at the channel banks on each side. Thepoly2nb()function does have asnap argument, which may also be used when input data possess geometrical artefacts.

library(spgrass6)
writeVECT6(Syracuse, "SY0")
contig <- vect2neigh("SY0")


## Distance-Based Neighbours

An alternative method is to choose the $k$ nearest points as neighbours -- this adapts across the study area, taking account of differences in the densities of areal entities. Naturally, in the overwhelming majority of cases, it leads to asymmetric neighbours, but will ensure that all areas have $k$ neighbours. The knearneigh() function returns an intermediate form converted to an nb object by knn2nb(); knearneigh() can also take a longlat= argument to handle geographical coordinates.

Sy8_nb <- knn2nb(knearneigh(coords, k=1), row.names=IDs)
Sy9_nb <- knn2nb(knearneigh(coords, k=2), row.names=IDs)
Sy10_nb <- knn2nb(knearneigh(coords, k=4), row.names=IDs)
nb_l <- list(k1=Sy8_nb, k2=Sy9_nb, k4=Sy10_nb)
sapply(nb_l, function(x) is.symmetric.nb(x, verbose=FALSE, force=TRUE))
sapply(nb_l, function(x) n.comp.nb(x)$nc)  oopar <- par(mfrow=c(1,3), mar=c(1,1,1,1)+0.1) plot(Syracuse, border="grey60") plot(Sy8_nb, coords, add=TRUE, pch=".") text(bbox(Syracuse)[1,1], bbox(Syracuse)[2,2], labels="a)", cex=0.8) plot(Syracuse, border="grey60") plot(Sy9_nb, coords, add=TRUE, pch=".") text(bbox(Syracuse)[1,1], bbox(Syracuse)[2,2], labels="b)", cex=0.8) plot(Syracuse, border="grey60") plot(Sy10_nb, coords, add=TRUE, pch=".") text(bbox(Syracuse)[1,1], bbox(Syracuse)[2,2], labels="c)", cex=0.8) par(oopar)  a:$k=1$neighbours; b:$k=2$neighbours; c:$k=4$neighbours The figure shows the neighbour relationships for$k= 1, 2, 4$, with many components for$k=1$. If need be,$k$-nearest neighbour objects can be made symmetrical using the make.sym.nb() function. The$k=1$object is also useful in finding the minimum distance at which all areas have a distance-based neighbour. Using the nbdists() function, we can calculate a list of vectors of distances corresponding to the neighbour object, here for first nearest neighbours. The greatest value will be the minimum distance needed to make sure that all the areas are linked to at least one neighbour. The dnearneigh() function is used to find neighbours with an interpoint distance, with arguments d1 and d2 setting the lower and upper distance bounds; it can also take a longlat argument to handle geographical coordinates. dsts <- unlist(nbdists(Sy8_nb, coords)) summary(dsts) max_1nn <- max(dsts) max_1nn Sy11_nb <- dnearneigh(coords, d1=0, d2=0.75*max_1nn, row.names=IDs) Sy12_nb <- dnearneigh(coords, d1=0, d2=1*max_1nn, row.names=IDs) Sy13_nb <- dnearneigh(coords, d1=0, d2=1.5*max_1nn, row.names=IDs) nb_l <- list(d1=Sy11_nb, d2=Sy12_nb, d3=Sy13_nb) sapply(nb_l, function(x) is.symmetric.nb(x, verbose=FALSE, force=TRUE)) sapply(nb_l, function(x) n.comp.nb(x)$nc)

oopar <- par(mfrow=c(1,3), mar=c(1,1,1,1)+0.1)
plot(Syracuse, border="grey60")
text(bbox(Syracuse)[1,1], bbox(Syracuse)[2,2], labels="a)", cex=0.8)
plot(Syracuse, border="grey60")
text(bbox(Syracuse)[1,1], bbox(Syracuse)[2,2], labels="b)", cex=0.8)
plot(Syracuse, border="grey60")
text(bbox(Syracuse)[1,1], bbox(Syracuse)[2,2], labels="c)", cex=0.8)
par(oopar)


a: Neighbours within 1,158m; b: neighbours within 1,545m; c: neighbours within 2,317m

The figure shows how the numbers of distance-based neighbours increase with moderate increases in distance. Moving from $0.75$ times the minimum all-included distance, to the all-included distance, and $1.5$ times the minimum all-included distance, the numbers of links grow rapidly. This is a major problem when some of the first nearest neighbour distances in a study area are much larger than others, since to avoid no-neighbour areal entities, the distance criterion will need to be set such that many areas have many neighbours.

dS <- c(0.75, 1, 1.5)*max_1nn
res <- sapply(nb_l, function(x) table(card(x)))
mx <- max(card(Sy13_nb))
res1 <- matrix(0, ncol=(mx+1), nrow=3)
rownames(res1) <- names(res)
colnames(res1) <- as.character(0:mx)
res1[1, names(res$d1)] <- res$d1
res1[2, names(res$d2)] <- res$d2
res1[3, names(res$d3)] <- res$d3
library(RColorBrewer)
pal <- grey.colors(3, 0.95, 0.55, 2.2)
# RSB quietening greys
barplot(res1, col=pal, beside=TRUE, legend.text=FALSE, xlab="numbers of neighbours", ylab="tracts")
legend("topright", legend=format(dS, digits=1), fill=pal, bty="n", cex=0.8, title="max. distance")


Distance-based neighbours: frequencies of numbers of neighbours by census tract

The figure shows the counts of sizes of sets of neighbours for the three different distance limits. In Syracuse, the census tracts are of similar areas, but were we to try to use the distance-based neighbour criterion on the eight-county study area, the smallest distance securing at least one neighbour for every areal entity is over 38km.

dsts0 <- unlist(nbdists(NY_nb, coordinates(NY8)))
summary(dsts0)


If the areal entities are approximately regularly spaced, using distance-based neighbours is not necessarily a problem. Provided that care is taken to handle the side effects of "weighting" areas out of the analysis, using lists of neighbours with no-neighbour areas is not necessarily a problem either, but certainly ought to raise questions. Different disciplines handle the definition of neighbours in their own ways by convention; in particular, it seems that ecologists frequently use distance bands. If many distance bands are used, they approach the variogram, although the underlying understanding of spatial autocorrelation seems to be by contagion rather than continuous.

## Higher-Order Neighbours

Distance bands can be generated by using a sequence of d1 and d2 argument values for the dnearneigh() function if needed to construct a spatial autocorrelogram as understood in ecology. In other conventions, correlograms are constructed by taking an input list of neighbours as the first-order sets, and stepping out across the graph to second-, third-, and higher-order neighbours based on the number of links traversed, but not permitting cycles, which could risk making $i$ a neighbour of $i$ itself [@osullivan+unwin:03]. The nblag() function takes an existing neighbour list and returns a list of lists, from first to maxlag= order neighbours.

Sy0_nb_lags <- nblag(Sy0_nb, maxlag=9)
names(Sy0_nb_lags) <- c("first", "second", "third", "fourth", "fifth", "sixth", "seventh", "eighth", "ninth")
res <- sapply(Sy0_nb_lags, function(x) table(card(x)))
mx <- max(unlist(sapply(Sy0_nb_lags, function(x) card(x))))
nn <- length(Sy0_nb_lags)
res1 <- matrix(0, ncol=(mx+1), nrow=nn)
rownames(res1) <- names(res)
colnames(res1) <- as.character(0:mx)
for (i in 1:nn) res1[i, names(res[[i]])] <- res[[i]]
res1


This shows how the wave of connectedness in the graph spreads to the third order, receding to the eighth order, and dying away at the ninth order - there are no tracts nine steps from each other in this graph. Both the distance bands and the graph step order approaches to spreading neighbourhoods can be used to examine the shape of relationship intensities in space, like the variogram, and can be used in attempting to look at the effects of scale.

## Grid Neighbours

When the data are known to be arranged in a regular, rectangular grid, the cell2nb() function can be used to construct neighbour lists, including those on a torus. These are useful for simulations, because, since all areal entities have equal numbers of neighbours, and there are no edges, the structure of the graph is as neutral as can be achieved. Neighbours can either be of type rook or queen.

cell2nb(7, 7, type="rook", torus=TRUE)
cell2nb(7, 7, type="rook", torus=FALSE)


When a regular, rectangular grid is not complete, then we can use knowledge of the cell size stored in the grid topology to create an appropriate list of neighbours, using a tightly bounded distance criterion. Neighbour lists of this kind are commonly found in ecological assays, such as studies of species richness at a national or continental scale. It is also in these settings, with moderately large $n$, here $n=3$,103, that the use of a sparse, list based representation shows its strength. Handling a $281 \times 281$ matrix for the eight-county census tracts is feasible, easy for a $63 \times 63$ matrix for Syracuse census tracts, but demanding for a 3,$103 \times 3$,103 matrix.

data(meuse.grid)
coordinates(meuse.grid) <- c("x", "y")
gridded(meuse.grid) <- TRUE
dst <- max(slot(slot(meuse.grid, "grid"), "cellsize"))
mg_nb <- dnearneigh(coordinates(meuse.grid), 0, dst)
mg_nb
table(card(mg_nb))
`

## Try the spdep package in your browser

Any scripts or data that you put into this service are public.

spdep documentation built on April 18, 2022, 9:06 a.m.