knitr::opts_chunk$set( collapse = TRUE, fig.width=7, fig.height=5, comment = "#>" )
The package name odf
stands for origin destination (OD) flows.
Origin destination (OD) data describe movements from origins to destinations. These movements typically relate to humans (e.g. migration, passenger transport), but can also relate to the export of goods, the animal migration, and decease spreading.
It is worth noting that when one talks about OD data of countable flows, one refers to processed (aggregated) data where the original raw dataset describes the movements of each individual. Usually, the raw OD data has been processed by the data owner or domain expert, while ODDV designers only have access to processed OD data.
Origin and destination locations can be points or polygons. In the original raw data, these locations should in principle be points, but in practice for most OD data sets polygons are used, e.g. administrative units or grid tiles. In transport studies, these polygons are often referred to as zones.
The odf
package contains class definitions and methods for OD data.
library(odf)
The current version is still in development.
A realistic option is to merge this package with the od
package, which also contains partly-overlapping methods for OD data.
od
The fundamental concept behind the od
class is a graph structure $G = (U, E)$, where $U$ denotes the nodes and $E$ the edges.
In our implementation, an object of class od
is a list of two elements, U
for nodes and E
for edges.
The list element U
is an sf
data.frame (i.e. a data frame with a geometry column), where each record correspond to an origin or destination point (or zone), and where the geometry column contains the corresponding points or polygons.
It is important that one column in U
should exist that contains key values by which origins and destinations can be identified.
The key column name is stored as an attribute named od_id
in the od
object.
The list element E
is a data.frame object that contains data about the edges, i.e. the movements.
The only required columns in E
are the origin and the destination identifier.
Each of these origin and destination ids should occur in the key column of U
.
The names of the origin and destination columns are stored as attributes od_orig
and od_dest
in the od
object.
Optionally, E
can be an sf
object that contains the polylines between origins and destinations.
These polylines can be the actual movement routes, simple straight lines, or anything in between.
This package contains a function to add lines automatically, with various options.
Note that users are free to add other columns to U
and E
. For aggregated OD data, E
typically contains flow counts, and grouping variables such as mode of transport.
An example is contained in this package about Dutch commuting flows.
# Read commuting data data(NL_commuting) data(NL_muni_poly) data(NL_muni_point)
The dataset NL_commuting
contains the estimated number of Dutch commuters grouped by mode of transport between municipalities. Cross-border commuters are not included.
The datasets NL_muni_poly
and NL_muni_point
contain the polygons and center points of the municipalities (the center points do not always correspond to the centroids of the polygons).
x = od(NL_commuting, NL_muni_point, col_orig = "muni_from", col_dest = "muni_to", col_id = "id") x
Users are free to choose their preferred data manipulation methods.
In the following code, dplyr
methods are used to group, summarize and filter the data.
For each origin and destination pair, the total number of commuters is calculated (summing over all transport modes).
Next, the commuters who stay in their own municipality are filtered out, as well flows below 500.
library(dplyr) x_totals = x x_totals$E = x_totals$E %>% group_by(muni_from, muni_to) %>% summarize(value = sum(value)) %>% filter(muni_from != muni_to, value >= 500) x_totals
The function od_is_valid
can be used to check whether the processed data is still valid:
od_is_valid(x_totals)
In the second example of data manipulation, we filter the data by region, where we focus on the western, most urban part of the Netherlands, called the Randstad.
library(sf) muni_prov = NL_muni_poly %>% st_drop_geometry() %>% select(id, NUTS2) x_randstad = x x_randstad$U = x_randstad$U %>% left_join(muni_prov, by = "id") %>% filter(NUTS2 %in% 31:33) %>% select(-NUTS2)
We have filter the nodes U
, by joining the data NL_muni_poly
, which contains the province codes, and select the provinces 31 (Utrecht), 32 (North-Holland) and 33 (South-Holland).
Note that in the current state, x_randstad
is not valid anymore, since E
still contains flows that start and/or end outside this area.
od_is_valid(x_randstad)
So, we'll have to filter the E
set as well:
x_randstad$E = x_randstad$E %>% left_join(muni_prov, by = c("muni_from" = "id")) %>% left_join(muni_prov, by = c("muni_to" = "id")) %>% filter(NUTS2.x %in% 31:33, NUTS2.y %in% 31:33) %>% select(-NUTS2.x, -NUTS2.y) %>% filter(muni_from != muni_to, value >= 250) x_randstad
When we check the validity again, we see that it is valid now:
od_is_valid(x_randstad)
The function od_add_lines
is used to draw lines between origin and destinations points.
By default slightly curved lines are drawn, in order to distinguish between opposite directions.
x_totals = od_add_lines(x_totals)
Visualizing od data can be done with the tmap
package:
require("tmap") tm_shape(x_totals$U) + tm_dots() + tm_shape(x_totals$E) + tm_lines(lwd = "value", scale = 6, title.lwd = "Number of commuters") + tm_layout(scale = .5)
Below are maps with other default values: straight lines, curved lines and half-lines:
tm = tm_shape(x_totals$U) + tm_dots()
x_totals_straight = od_add_lines(x_totals, angle = 0, overwrite.geometry = TRUE) tm + tm_shape(x_totals_straight$E) + tm_lines(lwd = "value", scale = 6) + tm_layout(title = "Straight lines", scale = .5)
x_totals_large_curve = od_add_lines(x_totals, angle = 1/6 * pi, overwrite.geometry = TRUE) tm + tm_shape(x_totals_large_curve$E) + tm_lines(lwd = "value", scale = 6) + tm_layout(title = "Curved lines", scale = .5)
x_totals_half_lines = od_add_lines(x_totals, angle = 0, range = c(.5, 1), overwrite.geometry = TRUE) tm + tm_shape(x_totals_half_lines$E) + tm_lines(lwd = "value", scale = 6) + tm_layout(title = "Half lines", scale = .5)
For the second example, we will show small multiples, one per mode of transport:
x_randstad = od_add_lines(x_randstad, angle = 0, range = c(.45, 1), overwrite.geometry = TRUE) x_randstad$E = x_randstad$E %>% mutate(value_plus_500 = value + 500) require(tmaptools) basemap = tmaptools::read_osm(x_randstad$U) qtm(basemap, is.master = TRUE) + tm_shape(x_randstad$U) + tm_dots("grey30") + tm_shape(x_randstad$E) + tm_lines(lwd = "value_plus_500", scale = 20, col = "black", legend.lwd.show = FALSE) + tm_facets(by = "mode", free.coords = FALSE)
A more sophisticated map using the same data is shown below:
knitr::include_graphics("images/halfline_doughnuts.jpg")
This visualization can be created with the package donutmaps
(https://github.com/mtennekes/donutmaps), which uses odf
.
See an interactive prototype based on his visualization.
This version has a couple of new features: 1) it offers views for other parts of the country, 2) it also uses cross-border data (not open data yet), and 3) also the 'stay' population is shown in the doughnuts.
This is a minimal example from the vignette of the package od
(https://github.com/ITSLeeds/od/blob/master/vignettes/od.Rmd)
od_data_example = data.frame( o = "Leeds", d = "London", trips_per_year = 10 ) p = sf::st_as_sf( data.frame( name = c("Leeds", "London"), lon = c(-1.5, -0.1), lat = c(53.8, 51.5) ), coords = c("lon", "lat"), crs = 4326 ) x = od(E = od_data_example, U = p, col_orig = "o", col_dest = "d", col_id = "name") x = od_add_lines(x) # by default a curved line to differentiate opposite directions tm_shape(x$U) + tm_dots() + tm_shape(x$E) + tm_lines()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.