1. The sfnetwork data structure

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
knitr::opts_knit$set(global.par = TRUE)
geos37 = sf::sf_extSoftVersion()["GEOS"] >= "3.7.0"
# plot margins
oldpar = par(no.readonly = TRUE)
par(mar = c(1, 1, 1, 1))
# crayon needs to be explicitly activated in Rmd
oldoptions = options()
options(crayon.enabled = TRUE)
# Hooks needs to be set to deal with outputs
# thanks to fansi logic
old_hooks = fansi::set_knit_hooks(
  knitr::knit_hooks,
  which = c("output", "message", "error")
)

The core of the sfnetworks package is the sfnetwork data structure. It inherits the tbl_graph class from the tidygraph package, which itself inherits the igraph class from the igraph package. Therefore, sfnetwork objects are recognized by all network analysis algorithms that igraph offers (which are a lot, see here) as well as by the tidy wrappers that tidygraph has built around them.

It is possible to apply any function from the tidyverse packages for data science directly to a sfnetwork, as long as tidygraph implemented a network specific method for it. On top of that, sfnetworks added several methods for functions from the sf package for spatial data science, such that you can also apply those directly to the network. This takes away the need to constantly switch between the tbl_graph, tbl_df and sf classes when working with geospatial networks.

library(sfnetworks)
library(sf)
library(tidygraph)
library(tidyverse)
library(igraph)

Philosophy

The philosophy of a tbl_graph object is best described by the following paragraph from the tidygraph introduction: "Relational data cannot in any meaningful way be encoded as a single tidy data frame. On the other hand, both node and edge data by itself fits very well within the tidy concept as each node and edge is, in a sense, a single observation. Thus, a close approximation of tidyness for relational data is two tidy data frames, one describing the node data and one describing the edge data."

Since sfnetworks subclass tbl_graph, it shares the same philosophy. However, it extends it into the domain of geospatial data analysis, where each observation has a location in geographical space. For that, it brings sf into the game. An sf object stores the geographical coordinates of each observation in standardized format in a geometry list-column, which has a Coordinate Reference System (CRS) associated with it. Thus, in sfnetworks, we re-formulate the last sentence of the paragraph above to the following. "A close approximation of tidyness for relational geospatial data is two sf objects, one describing the node data and one describing the edge data."

We do need to make a note here. In a geospatial network, the nodes always have coordinates in geographic space, and thus, can always be described by an sf object. The edges, however, can also be described by only the indices of the nodes at their ends. This still makes them geospatial, because they connect two specific points in space, but the spatial information is not explicitly attached to them. Both representations can be useful. In road networks, for example, it makes sense to explicitly draw a line geometry between two nodes, while in geolocated social networks, it probably does not. sfnetworks supports both types. It can either describe edges as an sf object, with a linestring geometry stored in a geometry list-column, or as a regular data frame, with the spatial information implicitly encoded in the node indices of the endpoints. We refer to these two different types of edges as spatially explicit edges and spatially implicit edges respectively. In most of the documentation, however, we focus on the first type, and talk about edges as being an sf object with linestring geometries.

Construction

From a nodes and edges table

The most basic way to construct a sfnetwork with spatially explicit edges is by providing the sfnetwork construction function one sf object containing the nodes, and another sf object containing the edges. This edges table should include a from and to column referring to the node indices of the edge endpoints. With a node index we mean the position of a node in the nodes table (i.e. its rownumber). A small toy example:

p1 = st_point(c(7, 51))
p2 = st_point(c(7, 52))
p3 = st_point(c(8, 52))
p4 = st_point(c(8, 51.5))

l1 = st_sfc(st_linestring(c(p1, p2)))
l2 = st_sfc(st_linestring(c(p1, p4, p3)))
l3 = st_sfc(st_linestring(c(p3, p2)))

edges = st_as_sf(c(l1, l2, l3), crs = 4326)
nodes = st_as_sf(c(st_sfc(p1), st_sfc(p2), st_sfc(p3)), crs = 4326)

edges$from = c(1, 1, 3)
edges$to = c(2, 3, 2)

net = sfnetwork(nodes, edges)
net
class(net)

By default, the created network is a directed network. If you want to create an undirected network, set directed = FALSE. Note that for undirected networks, the indices in the from and to columns are re-arranged such that the from index is always smaller than (or equal to, for loop edges) the to index. However, the linestring geometries remain unchanged. That means that in undirected networks it can happen that for some edges the from index refers to the last point of the edge linestring, and the to index to the first point. The behavior of ordering the indices comes from igraph and might be confusing, but remember that in undirected networks the terms from and to do not have a meaning and can thus be used interchangeably.

net = sfnetwork(nodes, edges, directed = FALSE)
net

Instead of from and to columns containing integers that refer to node indices, the provided edges table can also have from and to columns containing characters that refer to node keys. In that case, you should tell the construction function which column in the nodes table contains these keys. Internally, they will then be converted to integer indices.

nodes$name = c("city", "village", "farm")
edges$from = c("city", "city", "farm")
edges$to = c("village", "farm", "village")

edges

net = sfnetwork(nodes, edges, node_key = "name")
net

In a tbl_graph structure, the weights of edges are normally stored in a column named weight. You can set edge weights yourself after construction (see the example in the activation section). For convenience, you can also tell the construction function to calculate the geographical lengths of the edges and set those as weights during construction.

edges$from = c(1, 1, 3)
edges$to = c(2, 3, 2)

net = sfnetwork(nodes, edges, length_as_weight = TRUE)
net

If your edges table does not have linestring geometries, but only references to node indices or keys, you can tell the construction function to create the linestring geometries during construction. This will draw a straight line between the endpoints of each edge.

st_geometry(edges) = NULL

other_net = sfnetwork(nodes, edges, edges_as_lines = TRUE)

plot(net, cex = 2, lwd = 2, main = "Original geometries")
plot(other_net, cex = 2, lwd = 2, main = "Straight lines")

A sfnetwork should have a valid spatial network structure. For the nodes, this currently means that their geometries should all be of type POINT. In the case of spatially explicit edges, edge geometries should all be of type LINESTRING, nodes and edges should have the same CRS and endpoints of edges should match their corresponding node coordinates.

If your provided data do not meet these requirements, the construction function will throw an error.

st_geometry(edges) = st_sfc(c(l2, l3, l1), crs = 4326)

net = sfnetwork(nodes, edges)

You can skip the validity checks if you are already sure your input data meet the requirements, or if you don't care that they don't. To do so, set force = TRUE. However, remember that all functions in sfnetworks are designed with the assumption that the network has a valid structure.

From an sf object with linestring geometries

Instead of already providing a nodes and edges table with a valid network structure, it is also possible to create a network by only providing an sf object with geometries of type LINESTRING. Probably, this way of construction is most convenient and will be most often used.

It works as follows: the provided lines form the edges of the network, and nodes are created at their endpoints. Endpoints that are shared between multiple lines become one single node.

See below an example using the Roxel dataset that comes with the package. This dataset is an sf object with LINESTRING geometries that form the road network of Roxel, a neighborhood in the German city of Münster.

roxel
net = as_sfnetwork(roxel)
plot(net)

Other methods to convert 'foreign' objects into a sfnetwork exists as well, e.g. for SpatialLinesNetwork objects from stplanr and linnet objects from spatstat. See here for an overview.

From a network specific file type

To create networks from spatial data file types, you can use the sf function sf::st_read() to read them into R and then construct a sfnetwork from the sf object(s) using either sfnetwork() or as_sfnetwork() as described above. To create networks from graph specific file types, you can use the igraph function igraph::read_graph() to read them into R, and then convert them to a sfnetwork using as_sfnetwork() if possible. Lets look at an example with a GraphML file. First, we will convert it to a tbl_graph after loading such that we can explore the data.

url = "https://raw.githubusercontent.com/ComplexNetTSP/Power_grids/v1.0.0/Countries/Netherlands/graphml/Netherlands_highvoltage.graphml"

igraph::read_graph(url, format = "graphml") %>%
  as_tbl_graph()

We can see that the spatial geometries of nodes and edges are stored as WKT strings in columns named wktsrid4326. The sf function sf::st_as_sf() can easily convert such columns into the geometry list-column format that we need. Both sfnetwork() and as_sfnetwork() accept additional arguments as ... that will be internally forwarded to convert the nodes into an sf object before creating the network structure. Hence, we can directly convert the igraph object into a sfnetwork.

igraph::read_graph(url, format = "graphml") %>%
  as_sfnetwork(wkt = "wktsrid4326", crs = 4326)

Forwarding of additional arguments to sf::st_as_sf() only affects the nodes table, since that is the one that by definition has to be an sf object. Therefore, it left the edges unaffected and created a network without a geometry list-column for the edges (i.e. a network with spatially implicit edges). We can 'spatially explicitize' the edges after construction by using a spatial morpher function named to_spatial_explicit(). To learn more about spatial morphers and what they are, see the dedicated vignette for that. For now, it is sufficient to know that you can use any spatial morpher function inside the tidygraph::convert() verb to convert your network into a different state.

The to_spatial_explicit() function also accepts ... arguments that are forwarded to sf::st_as_sf().

graphml_net = igraph::read_graph(url, format = "graphml") %>%
  as_sfnetwork(wkt = "wktsrid4326", crs = 4326) %>%
  convert(to_spatial_explicit, wkt = "wktsrid4326", crs = 4326, .clean = TRUE)

graphml_net
plot(graphml_net)

Activation

A sfnetwork is a multitable object in which the core network elements (i.e. nodes and edges) are embedded as sf objects. However, thanks to the neat structure of tidygraph, there is no need to first extract one of those elements before you are able to apply your favorite sf function or tidyverse verb. Instead, there is always one element at a time labeled as active. This active element is the target of data manipulation. All functions from sf and the tidyverse that are called on a sfnetwork, are internally applied to that active element. The active element can be changed with the activate() verb, i.e. by calling activate("nodes") or activate("edges"). For example, setting the geographical length of edges as edge weights and subsequently calculating the betweenness centrality of nodes can be done as shown below. Note that tidygraph::centrality_betweenness() does require you to always explicitly specifiy which column should be used as edge weights, and if the network should be treated as directed or not.

net %>%
  activate("edges") %>%
  mutate(weight = edge_length()) %>%
  activate("nodes") %>%
  mutate(bc = centrality_betweenness(weights = weight, directed = FALSE))

Some of the functions have effects also outside of the active element. For example, whenever nodes are removed from the network, the edges terminating at those nodes will be removed too. This behavior is not symmetric: when removing edges, the endpoints of those edges remain, even if they are not an endpoint of any other edge. This is because by definition edges can never exist without nodes on their ends, while nodes can peacefully exist in isolation.

Extraction

Neither all sf functions nor all tidyverse verbs can be directly applied to a sfnetwork as described above. That is because there is a clear limitation in the relational data structure that requires rows to maintain their identity. Hence, a verb like dplyr::summarise() has no clear application for a network. For sf functions, this means also that the valid spatial network structure should be maintained. That is, functions that summarise geometries, or (may) change their type, shape or position, are not supported directly. These are for example the geometric binary operations, most of the geometric unary operations, sf::st_union(), sf::st_combine(), sf::st_cast(), and sf::st_jitter().

These functions cannot be directly applied to a sfnetwork, but no need to panic! The active element of the network can at any time be extracted with sf::st_as_sf() (or tibble::as_tibble()). This allows you to continue a specific part of your analysis outside of the network structure, using a regular sf object. Afterwards you could join inferred information back into the network. See the vignette about spatial joins for more details.

net %>%
  activate("nodes") %>%
  st_as_sf()

Although we recommend for reasons of clarity to always explicitly activate an element before extraction, you can also use a shortcut by providing the name of the element you want to extract as extra argument to sf::st_as_sf():

st_as_sf(net, "edges")

Visualization

The sfnetworks package does not (yet?) include advanced visualization options. However, as already demonstrated before, a simple plot method is provided, which gives a quick view of how the network looks like.

plot(net)

If you have ggplot2 installed, you can also use ggplot2::autoplot() to directly create a simple ggplot of the network.

autoplot(net) + ggtitle("Road network of Münster Roxel")

For advanced visualization, we encourage to extract nodes and edges as sf objects, and use one of the many ways to map those in R, either statically or interactively. Think of sf's default plot method, ggplot2::geom_sf(), tmap, mapview, et cetera.

net = net %>%
  activate("nodes") %>%
  mutate(bc = centrality_betweenness())

ggplot() +
  geom_sf(data = st_as_sf(net, "edges"), col = "grey50") +
  geom_sf(data = st_as_sf(net, "nodes"), aes(col = bc, size = bc)) +
  ggtitle("Betweenness centrality in Münster Roxel")

Note: it would be great to see this change in the future, for example by good integration with ggraph. Contributions are very welcome regarding this!

Spatial information

Geometries

Geometries of nodes and edges are stored in an 'sf-style' geometry list-column in respectively the nodes and edges tables of the network. The geometries of the active element of the network can be extracted with the sf function sf::st_geometry().

net %>%
  activate("nodes") %>%
  st_geometry()

Just as with sf::st_as_sf(), there is also a shortcut to quickly extract geometries from any of the network elements.

st_geometry(net, "edges")

Geometries can be replaced using either st_geometry(x) = value or the pipe-friendly st_set_geometry(x, value). However, a replacement that breaks the valid spatial network structure will throw an error.

Replacing a geometry with NULL will remove the geometries. Removing edge geometries will result in a sfnetwork with spatially implicit edges. Removing node geometries will result in a tbl_graph, losing the spatial structure.

net %>%
  activate("edges") %>%
  st_set_geometry(NULL) %>%
  plot(draw_lines = FALSE, main = "Edges without geometries")

net %>%
  activate("nodes") %>%
  st_set_geometry(NULL) %>%
  plot(vertex.color = "black", main = "Nodes without geometries")

Geometries can be replaced also by using geometry unary operations, as long as they don't break the valid spatial network structure. In practice this means that only sf::st_reverse() and sf::st_simplify() are supported. When calling sf::st_reverse() on the edges of a directed network, not only the geometries will be reversed, but the from and to columns of the edges will be swapped as well. In the case of undirected networks these columns remain unchanged, since the terms from and to don't have a meaning in undirected networks and can be used interchangeably. Note that reversing linestrings using sf::st_reverse() only works when sf links to a GEOS version of at least 3.7.0.

as_sfnetwork(roxel, directed = TRUE) %>%
  activate("edges") %>%
  st_reverse()

Coordinates

The coordinates of the active element of a sfnetwork can be extracted with the sf function sf::st_coordinates().

node_coords = net %>%
  activate("nodes") %>%
  st_coordinates()

node_coords[1:4, ]

Besides X and Y coordinates, the features in the network can possibly also have Z and M coordinates.

# Currently there are neither Z nor M coordinates.
st_z_range(net)
st_m_range(net)

# Add Z coordinates with value 0 to all features.
# This will affect both nodes and edges, no matter which element is active.
st_zm(net, drop = FALSE, what = "Z")

Coordinate query functions can be used for the nodes to extract only specific coordinate values. Such query functions are meant to be used inside dplyr::mutate() or dplyr::filter() verbs. Whenever a coordinate value is not available for a node, NA is returned along with a warning. Note also that the two-digit coordinate values are only for printing. The real values contain just as much precision as in the geometry list column.

net %>%
  st_zm(drop = FALSE, what = "Z") %>%
  mutate(X = node_X(), Y = node_Y(), Z = node_Z(), M = node_M())

Coordinate Reference System

The Coordinate Reference System in which the coordinates of the network geometries are stored can be extracted with the sf function sf::st_crs(). The CRS in a valid spatial network structure is always the same for nodes and edges.

st_crs(net)

Coordinates can be transformed using the sf function sf::st_transform(). Since the CRS is the same for nodes and edges, transforming coordinates of the active element into a different CRS will automatically also transform the coordinates of the inactive element into the same target CRS.

st_transform(net, 3035)

Bounding box

The bounding box of the active element of a sfnetwork can be extracted with the sf function sf::st_bbox().

net %>%
  activate("nodes") %>%
  st_bbox()

The bounding boxes of the nodes and edges are not necessarily the same. Therefore, sfnetworks adds the st_network_bbox() function to retrieve the combined bounding box of the nodes and edges. In this combined bounding box, the most extreme coordinates of the two individual element bounding boxes are preserved. Hence, the xmin value of the network bounding box is the smallest xmin value of the node and edge bounding boxes, et cetera.

node1 = st_point(c(8, 51))
node2 = st_point(c(7, 51.5))
node3 = st_point(c(8, 52))
node4 = st_point(c(9, 51))
edge1 = st_sfc(st_linestring(c(node1, node2, node3)))

nodes = st_as_sf(c(st_sfc(node1), st_sfc(node3), st_sfc(node4)))
edges = st_as_sf(edge1)
edges$from = 1
edges$to = 2

small_net = sfnetwork(nodes, edges)

node_bbox = st_as_sfc(st_bbox(activate(small_net, "nodes")))
edge_bbox = st_as_sfc(st_bbox(activate(small_net, "edges")))
net_bbox = st_as_sfc(st_network_bbox(small_net))

plot(small_net, lwd = 2, cex = 4, main = "Element bounding boxes")
plot(node_bbox, border = "red", lty = 2, lwd = 4, add = TRUE)
plot(edge_bbox, border = "blue", lty = 2, lwd = 4, add = TRUE)
plot(small_net, lwd = 2, cex = 4, main = "Network bounding box")
plot(net_bbox, border = "red", lty = 2, lwd = 4, add = TRUE)

Attribute-geometry relationships

In sf objects there is the possibility to store information about how attributes relate to geometries (for more information, see here). You can get and set this information with the function sf::st_agr() (for the setter, you can also use the pipe-friendly version sf::st_set_agr()). In a sfnetwork, you can use the same functions to get and set this information for the active element of the network.

Note that the to and from columns are not really attributes of edges seen from a network analysis perspective, but they are included in the agr factor to ensure smooth interaction with sf.

net %>%
  activate("edges") %>%
  st_set_agr(c("name" = "constant", "type" = "constant")) %>%
  st_agr()

However, be careful, because we are currently not sure if this information survives all functions from igraph and tidygraph. If you have any issues with this, please let us know in our issue tracker.

par(oldpar)
options(oldoptions)


Try the sfnetworks package in your browser

Any scripts or data that you put into this service are public.

sfnetworks documentation built on May 14, 2021, 1:06 a.m.