In Robinlovelace/pctSeville:

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "figures/README-"
)

This repository stores code for estimating cycling potential in Seville, Spain. It builds on the Propensity to Cycle Tool (PCT) method deployed for England. Although the analysis is localised, the methods are designed to be applicable to other cities. To reproduce the results presented here, you will need to have a recent version of R with the following packages installed:

library(tmap)
tmap_mode("view")
library(tmaptools)
library(stplanr)
library(sf)

Data input

The main data source used by the PCT is origin-destination (OD) data. This can be aquired from many places, including (in roughly descending order of quality):

Census of population (see the UK's Wicid open data portal for an example of this)
Travel survey data
Mobile telephone company data
Modelled data using a spatial interaction model

Each of these has advantages and disadvantages.

There are two main data sources that can be used to model OD-level travel in Seville: official data on population counts, as explained in a short article and OpenStreetMap (OSM) data. We will start by using randomly generated data to demonstrate the methods. Then we will use OSM data because it is smaller and more generalisable to other cities.

Study region

The first stage is to identify the extent of the city:

region_bb = bb("ciudad expo sevilla", ext = 9)
region_poly = bb2poly(region_bb)
qtm(region_poly)

Randomly generated data

Let's split that region into 100 evenly sized areas, and give each cell a random value between 1 and 100:

r = raster::raster(ext = raster::extent(region_poly), nrows = 10, ncols = 10)
set.seed(1)
raster::values(r) = runif(n = 100, min = 0, max = 100)
(m = qtm(region_poly) +
  tm_shape(r) + tm_raster())

Now let's convert those residential zones into centroids:

o = as(r, "SpatialPointsDataFrame")
o@data = cbind(code_o = 1:nrow(o), o@data)
m +
  qtm(o)

Next, let's take 10 destinations at random and allocate attractiveness to them at random:

region_centre = rgeos::gCentroid(region_poly)
region_cbuf = buff_geo(region_centre, 5000)
d = spsample(region_cbuf, n = 10, type = "random")
d = SpatialPointsDataFrame(
  d, data.frame(code_d = 1:length(d), w = runif(n = 10, min = 0, max = 100))
  )
(m = m + qtm(d, symbols.size  = "w"))

We can estimate the 'flow' (T) between origins (o) and destinations (d) in many ways. The simplest is a simple gravity model, whereby, for each OD pair:

$$ T = \frac{mn}{d^2} $$ whereby m and n are some measure of size/attractiveness of o and d respectively. Implementing this in code, we can calculate all the flows as follows:

T_od = matrix(nrow = nrow(o), ncol = nrow(d))
for(i in 1:nrow(o)) {
  for(j in 1:nrow(d)) {
    T_od[i, j] = o$layer[i] * d$w[j] / geosphere::distHaversine(o[i,], d[j,])
  }
}
head(as.data.frame(table(T_od)))
T_odp = as.data.frame.table(T_od)
T_odp$Var1 = rep(1:nrow(o), times = nrow(d))
T_odp$Var2 = rep(1:nrow(d), each = nrow(o))
names(T_odp) = c("code_o", "code_d", "Flow")
head(T_odp)

We convert this into spatial lines as follows:

l = od2line(flow = T_odp, zones = o, destinations = d)

On the map of Seville, and with width and opacity proportional to flow, this looks as follows:

m + tm_shape(l) + tm_lines(lwd = "Flow", scale = 10)

Now we can save this data as an OD matrix:

write.csv(l@data, "data/od-long-random.csv")

Using real data

From here we will use real data. We use only data on one area initially, and this is saved in the file data/stations.geojson (the vignettes folder contains data to reproduce this data) which we load with help from the sf library:

library(dplyr) # for data analysis
library(osmdata) # download data from OSM

# run if online
q = opq(bbox = c(-6.08, 37.32, -6.03, 37.38)) %>% 
  add_osm_feature(key = "railway", value = "station")
stations = osmdata_sf(q)$osm_points
saveRDS(stations, "data/stations.Rds")

Official data on cycling is saved in the file data/pop.geojson, which can be loaded and visualised as follows:

stations = readRDS("data/stations.Rds")
pop = read_sf("data/pop.geojson")
pop_cents = read_sf("data/pop_cents.geojson")
qtm(pop, "pob_tot1") + 
  qtm(stations)

The routes between each residential origin and its nearest station destination can be calculated as follows:

pop_cents = filter(pop_cents, pob_tot1 > 100)
od = data.frame(origin = pop_cents$OBJECTID, destination = NA)
for(i in 1:nrow(pop_cents)) {
  for(j in 1:nrow(stations)) {
    od$destination[i] = as.character(stations$osm_id[which.min(st_distance(stations, pop_cents[i,]))])
  }
}
l = od2line(flow = od, zones = as(pop_cents, "Spatial"), destinations = as(stations, "Spatial"))
qtm(l)

These can be allocated to the transport network as follows:

routes = stplanr::line2route(l = l, route_fun = route_graphhopper, vehicle = "bike")
routes$potential = pop_cents$pob_tot1 * 0.556
qtm(routes, lines.lwd = "potential", scale = 10, lines.alpha = 0.5)

The final stage is to aggregate the values of overlapping lines:

rnet = overline(routes, "potential")
tm_shape(pop) +
  tm_fill(col = "pob_tot1", n = 4, breaks = c(0, 10, 100, 1000)) +
tm_shape(rnet) +
  tm_lines(lwd = "potential", scale = 20) +
  qtm(stations) +
  tm_scale_bar()