echo = FALSE,
  collapse = TRUE,
  comment = "#>"
cp vignettes/upthat.Rmd /tmp/upthat-allvignettes.Rmd
cat vignettes/adaptation.Rmd >> /tmp/upthat-allvignettes.Rmd
# refs_cycle_hire = RefManageR::ReadZotero(group = "418217", .params = list(collection = "JAK5PW5K", limit = 100))
# refs_atf = RefManageR::ReadZotero(group = "418217", .params = list(collection = "TJEQMXJR", limit = 100))
# o = ls()
# r = o[grepl(pattern = "refs_", x = o)]
# refs = = c, args = mget(r))
# ref_df =
# # View(ref_df)
# RefManageR::WriteBib(refs, "references.bib")
citr::tidy_bib_file(rmd_file = "/tmp/upthat-allvignettes.Rmd", messy_bibliography = "~/uaf/allrefs.bib", file = "vignettes/references.bib")


The Upthat manual describes the 'front end' of the tool, which has been designed to be used by practitioners, policy-makers and interested members of the public. This document reports the 'back-end' of the web application and the framework it uses to estimate modal shift at the city level. It is designed to be used by developers, researchers and data analysts. The aim is to show how the tool was built.^[ See the calibration report for details on the methods used to distribute trips across transport networks and estimating exposure to air pollution and physical activity. ]


This section documents deliverable 2, scenario development, for the WHO Upthat project, which involves: (1) setting out high level policy scenarios of active transport uptake; (2) converting these changes into estimates of rates of shift towards walking and cycling down to route network levels; and (3) simulating the impacts of these scenarios on walking and cycling levels citywide. Scenario development will also be informed by the transport scenarios assessed in Accra and Kathmandu as part of UHI project activities.

High level priorities are to improve population health, air quality and safety levels. This means increasing the proportion of the population that gets regular physical activity, reducing motorized vehicle use in urban centers and providing walking and cycling routes that are away from or otherwise protected from motor traffic. Urban transport policies can meet each of these objectives, creating 'win win win' options. Reducing car use, for example, will directly improve air quality and indirectly improve health and safety levels.

Specific scenarios of change include:

Citywide scenarios of change

To calculate scenarios of change, the minimum data requirements are the region's current modal split and data that can be used as explanatory variables, including data on transport infrastructure and car parking spaces, and data on the provision of public transport options.

Mode split is one of the fundamental pieces of information that is known for most cities. For context, it's useful to take a step back and consider the range of modal splits observed in a sample of cities worldwide. Many sources of mode share data are available, including the following (none of which were used due lack of access to the underlying data or other issues):

read.csv(stringsAsFactors = FALSE, text = "Source,Pros,Cons") %>% knitr::kable()

A problem with many datasets on mode split is that they ignore or oversimplify active transport [e.g. @aguilera_passenger_2014]. Examples of this are the OECD passenger transport dataset and the European Union's Modal split of inland passenger transport, 2016 webpage, which omits walking and cycling:


This suggests the need for an open, international dataset on modal splits in major cities over time. To overcome these issues we searched for crowd-sourced data on mode split. The figure below shows the diversity in mode splits based on 108 (primarily wealthy) cities, based on data of the type shown in the table below from Wikipedia.

dc = readRDS(url(""))
# dc = readRDS("global-data/city-mode-split-wiki.Rds")
k = dc %>% 
    walking == max(walking) |
    cycling == max(cycling) |
    pt == max(pt) |
    car == max(car) |
    car == min(car) 
    ) %>% 

|City | walking| cycling| pt| car| other| year| |:----------|-------:|-------:|--:|---:|-----:|----:| |Detroit | 1| 0| 2| 92| 5| 2016| |Amsterdam | 4| 40| 29| 27| 0| 2014| |Bratislava | 4| 0| 70| 26| 0| 2004| |Osaka | 27| 21| 34| 18| 0| 2000| |Helsinki | 37| 10| 30| 22| 1| 2016|

The figure shows that cars dominate the transport systems of most cities, accounting for a mode shares ranging from 18% (Osaka, Japan) to 85% (Adelaide, Australia). A more useful way to view travel systems from an active transport perspective is to view walking as the foundation of transport systems and all other modes to supplement walking [@tight_visions_2011]. This view is shown on the right of the figure, which shows walking mode shares in the sample of cities ranging from only 3% to more than 1/3rd (in Madrid, Spain and Vilnius, Lithuania). It is interesting to note that while there is a roughly even distribution of mode shares by car, other modes have more skewed distributions. This could reflect the diversity of policies used to promote walking, cycling and public transport and the fact that cars tend to be the default. This implication is that if no policies have been implemented to promote alternatives, cars dominate.

# see code/mode-split-cities.R
# knitr::include_graphics("")

The relationships between the different modes is illustrated in the figure below, which suggests competition between all modes and cars (with public transport seeming to be the biggest deterrent to driving in this dataset), and synergies between public transport and walking. This suggests that a reliable way to encourage walking in cities is through investment in public transport.


This city level data allows models of mode split to be developed, assuming there are sufficient explanatory variables defining the transport system.

Transport system data

Transport systems are complex, composed of hundreds of interrelated elements. For modelling purposes, it makes sense to condense available datasets down into a few key parameters per city, zone or origin-destination pair. The process of identifying and quantifying transport system variables should be be an open-ended and adaptable process that is able to respond when the input data changes (e.g. due to changes in the transport system or improved data availability/collection). The Global City Data dataset provides several transport-related variables on major cities, but the data are highly skewed towards wealthy cities, and are thus not suitable for developing nations. Instead, data from the Natural Earth open data repository was used for basic city statistics, including (the ones used in this project):

The city data, with these explanatory variables added, are shown in the figure below


A framework for estimating mode share needs to be flexible, to incorporate additional/modified variables that will become available. We suggest focussing on 'scale free', meaning that they do not depend on the city's size. The scale free nature of such measures also means that they can be used to estimate mode shares not only at city levels but also at the level of 'desire lines' connecting origins and destinations, as with the use of average gradient as a predictor of cycling potential in the PCT [@lovelace_propensity_2017]. Some additional variables, such as distance and relative time/cost for alternative modes, only make sense when measured at the OD level.

Modelling mode shift

Treating mode share as a dependent variable involves multiple interrelated dependent variables, which is problematic for standard regression approaches. An alternative used in cycling potential research has been to 'expand' mode split data into individual categorical variables [@lovelace_propensity_2017, supplementary information]. This strategy is computationally intensive however.

The datasets presented in the previous section are examples of proportional outcomes, where we are interested in the mode split of all travel in a city. These types of data are common in ecological modelling [@douma_analysing_2019], which provided a basis for investigating the question of citywide scenarios of mode shift in which modes are analogous to species. Dirichlet regression is a recently developed technique for modelling proportions based on a range of dependent variables [@maier_dirichletreg_2014].

Building on this work, support for Dirichlet regression was added to the R package brms [@burkner_brms_2017], which implements a Bayesian modelling framework based on the stan C++ library, in 2018. The advantage of brms is that it estimates uncertainty, vital for effective 'no-regrets' policy making. A basic example of the outputs of a Dirichlet regression model run are shown in the figure below, which represents the result of a model run for a subset of 28 cities for which we have access to population (other explanatory variables included provision of metro and bikeshare schemes).

m = brm(mode ~ population + pt, data = d_min, family = dirichlet())
res = predict(m2, new_data)
knitr::include_graphics(c("", ""))

The result suggest that beyond a certain size, increasingly large city populations are associated with a lower proportion of trips made by driving in the sample of cities used. More explanatory variables can be added using this framework, including categorical variables such as has_metro, the results of which (that increase the mode share by public transport and notably walking in the results, which also show confidence intervals) are shown. Of course, the quality of the prediction relies on good input data predicting mode shift and relies on the assumption that cities are in equilibrium states. This strengthens the need for open data on mode shift at city, OD and local levels over time following a range of interventions.

Results using different explanatory variables on a slightly larger dataset (n = 101) show the generalize-ability of the modelling framework. The figure below shows more policy relevant explanatory variables: number of bus stops per inhabitant and the provision of a tram system, which is more viable in many cities than a metro system.

knitr::include_graphics(c("", ""))

The figure above shows the marginal effect of changes to one variable, holding all other variables equal. The model accounts for fixed effects such as population density that are hard to change with policy interventions. The relationship between population density and mode split is interesting in itself however, as shown in the figure below.


Of course, we need more input data than the 101 cities taken from open data repositories shown above to reduce the confidence intervals. However, we have clearly demonstrated a robust and highly flexible way to model mode shift that accounts for the interrelations between different transport modes. The next step is to apply these models to real city datasets.

Estimates of rates of shift towards walking and cycling down to route network levels

Taking Accra as an example, let's see how the modelling framework can estimate mode shift (remember this is based on a small input dataset and a proof of concept rather than final results).

# note: depends on data in the who3 repo
cities = readRDS("cities.Rds")
m = readRDS("model-result-brms-density-bus-stops-101.Rds")
accra = cities %>% filter(City == "Accra") %>% 
  mutate(Density = Population / Area) %>% 
  select(Density, bus_stops_per_1000, has_tram, -geometry)

|City | Density| bus_stops_per_1000|has_tram | |:-----|-------:|------------------:|:--------| |Accra | 4787.81| 1.23008|FALSE |

The current mode split can be estimated as follows:

mode_share_current_estimate = predict(m, accra)[, , ]
knitr::kable(mode_share_current_est, digits = 2)

| | walking| cycling| pt| car| other| |:---------|-------:|-------:|----:|----:|-----:| |Estimate | 0.14| 0.05| 0.18| 0.59| 0.04| |Est.Error | 0.11| 0.07| 0.13| 0.16| 0.07| |Q2.5 | 0.01| 0.00| 0.01| 0.27| 0.00| |Q97.5 | 0.43| 0.25| 0.48| 0.88| 0.24|

In the PT scenario, we can increase the provision of buses to 10 per 1000 people, representing a high level of provision within the range of the sample of cities worldwide:

accra_pt = accra %>% mutate(bus_stops_per_1000 = 3)
mode_share_pt_estimate = predict(m, accra_pt)[, , ]
knitr::kable(mode_share_pt_estimate, digits = 2)
knitr::kable((mode_share_pt_estimate - mode_share_current_estimate)*100, digits = 1)
conditions = data.frame(bus_stops_per_1000 = c(accra$bus_stops_per_1000, accra_pt$bus_stops_per_1000))
effects = marginal_effects(m, "Density", conditions = conditions, categorical = TRUE, re_formula = NULL)
g = print(effects)
g$`Density:cats__` + 
  geom_vline(xintercept = accra$Density)

Estimated change in mode share (percentage points)

| | walking| cycling| pt| car| other| |:---------|-------:|-------:|---:|----:|-----:| |Central estimate | 0.8| 0.5| 1.1| -1.9| -0.5|

The result can be shown graphically, as shown in the figure below, that shows mode split estimates under the two model experiment conditions: one in which Accra has 1.2 bus stops per person (as it does currently) and one in which it has 3. The x axis shows that this model experiment can be generalized over the parameter space, in this case with x representing density, and the vertical line representing Accra's density (~5000 people per km2):


Under this scenario, the central estimate of car use drops by 12 percentage points while the central estimates for walking and cycling grow, by 2 percentage points and 4 percentage points, respectively. This highlights the synergies between active transport modes and bus use implicit in the data, suggesting that a combination of strong investment in public transport and active transport infrastructure can be complimentary. As mentioned already, larger input datasets, in particular with more example datasets from Africa and the developing world in general in this context, are needed to reduce the large confidence intervals around these estimates.

The framework enables us to model changes in mode share that would result from changes in any variable, categorical or continuous. Based on the input data, the impact of a tram system in Accra could be simulated as follows:

accra_tram = accra %>% mutate(has_tram = TRUE)
mode_share_tram_estimate = predict(m, accra_tram)[, , ]
knitr::kable(mode_share_tram_estimate, digits = 2)
knitr::kable((mode_share_tram_estimate - mode_share_current_estimate)*100, digits = 1)

As with any model, the usefulness of the outputs rely on the quality of the inputs and the assumptions underlying the model. These limitations, which reflect the paucity of open, curated data on mode shift in cities internationally, are covered in the next section. They are such that the current data (which only has one data point in Africa and only a handful of cities in the developing world) is not deemed of sufficient size and diversity to make useful predictions of modal shift. Instead of presenting results based on limited input data, the remainder of this section outlines how mode shift could be estimated, provided sufficiently large and diverse input dataset on mode shift following interventions. The basic tenets of this method are that:

Based on these tenets, an approach to estimate mode shift in response to the scenarios outlined above is detailed below. We have shown that we can model multi-modal responses to continuous and categorical variables in a robust Bayesian framework with explicit treatment of uncertainty. Under this framework, the scenario definition can be simplified to the identification and modification of explanatory variables that are available at the city level for a sufficiently large sample (500+) settlements with sufficient diversity to represent the changes that could take place in the cities under consideration.

Changes could be made in a systematic way to each of the predictors to represent change on the ground. Adding amounts to continuous variables by amounts determined by the input data, e.g. with 25^th^, 50^th^ and 75^th^ percentile increases representing low, medium and high levels of change, with a modifier (e.g. $1 - current_provision / max_provision$ ) to represent the law of diminishing returns, would be one way to achieve this (that was roughly the approach used in the example scenario of increasing bus stop provision in Accra).

To provide a concrete example, imagine that the maximum number of bus stops in the city dataset is 15 per 1000 people and that the 75^th^ percentile level of provision is 3 bus stops per 1000. In this case, an ambitious increase would be calculated for cities that currently have no bus stops, 1 bus stop, 10 and 15 bus stops per person as follows:

max_provision = 15
max_increase_in_provision = 3
current_provision = c(
modifier = (1 - current_provision / max_provision)
increase_in_provision = max_increase_in_provision * modifier
future_provision = current_provision + increase_in_provision 

For categorical variables, the changes are simpler: a one-size-fits-all change to a categorical variable that will only affect cities that do not currently have a specific piece of infrastructure (e.g. a tram system in the example above).

Get walking

This scenario refers to a global (meaning without spatial input components, but with spatially distributed consequences) walking uptake, as a result of citywide policies to promote safe and attractive walking.

Key variables that are readily available for most cities, that could be modified by policies, include [@kerr_jacqueline_perceived_2016]:

Get cycling

This scenario refers to a global scenario of cycling, as a result of citywide policies to provide safe cycleways.

Other important variables include average gradient, type of cycle network, level or car ownership and directness of cycle routes compared with driving routes [@parkin_estimation_2008].

Car diet

This scenario refers a global, citywide scenario of multi-modal transport change, showing reduced levels of driving following disincentives to own and use cars. Variables that could be modified in support of this scenario include:

Go public transport

This scenario is a global scenario of public transport uptake, linked to SDG 11. It would involve modifying explanatory variables that represent public public transport provision. Variables we could change in this scenario could include:

Go car free

This scenario refers to investment in car free city centers and other spaces, other locally specific scenarios, such as reductions in car parking spaces.

Other modifiable transport system variables

In addition to the specific scenarios outlined above, there are other important variables that could be modified in model experiments, either as stand-alone interventions, or to supplement specific scenarios. These include:

Fixed effects

Many variables are outside the scope of policy intervention but are important to consider in models nonetheless. An example of a fixed effect in the example above was population density. However, the predictions of current mode split in Accra were unrealistic because other fixed effects were omitted. There was no variable accounting for the fact that in Accra most people cannot afford a car, for example. Also, differences in culture influence transport systems. Variables to account for these fixed effects could include:

Data limitations and discussion

The data requirements of a robust model to estimate mode shift in the Bayesian, multi-model framework outlined above are substantial. The preliminary results are inherently limited by the small size and skewed nature of the input city dataset, shown in the map.

cities = readRDS("../global-data/cities.Rds")
tm_shape(cities) +
  tm_dots(size = "Population", col = "walking", palette = "viridis")

The map shows that there are only 2 cities with a high (40% +) level of walking, and these were cities that we added to the cities dataset. A priority for future is to expand this cities dataset to make it larger and more representative of cities where the Upthat tool is most likely to be used.

Scenario development accounting for the transport systems in Accra and Kathmandu, as part of UHI project activities, must be based on current transport data. An overview for Ghana (a proxy for the travel pattern in Accra) is shown below.

A more subtle data limitation surrounds modal categories. This is highlighted in the figure below which shows the full diversity of modes used in Accra from survey data on the left, and the effect of simplifying these categories into the modes which are most commonly reported. Future research should explore ways to gain data on and incorporate a richer diversity of transport modes.

A more conceptual question is how to account for the ordered nature of transport systems, e.g.:

Walking > Cycling > Public Transport > Cars

In terms of typical costs per km (assuming you have access to a bicycle) and the reverse order in terms of maximum speeds and energy costs per km. The framework outlined above could accept an arbitrary number of modes and mode size, speed and energy requirements could be accounted for by adding hybrid variables such as 'maximum potential mode share by mode walking/cycling' based on data on average trip distances per city (which may not be available for most cities) and distance decay parameters published in the literature.


Adapting the scenarios of change

The previous section outlines how additional of change can be added within the Bayesian framework using Dirichlet regression. The largest barrier to accurate predictions of mode shift following certain interventions, such as those described in the 5 scenarios above, was identified as being lack of data on mode split across cities, let alone with multiple time series enabling models to predict not only mode shift levels but change. This barrier is not insurmountable. If larger datasets on the dependent variable become available, the scenarios can be adapted to new city datasets as follows:

The bullet points above show that all of the technology and most of the data (although this is continuously evolving and depends on better mode split data across cities internationally) already exists for this approach to be deployed internationally and provides an indication of the amount of work required to do so.

Adapting the web app

The Upthat web app itself is fully reproducible. An up-to-date R installation on a modern computer should reproduce a fully working version of the app with the following commands:


To modify the Upthat source code, download the repo and open it in an editor such as RStudio, e.g. with the following commands on a Linux terminal:

git clone
rstudio upthat/upthat.Rproj

To view the updated version of the app after making changes rebuild it, e.g. with Ctl+Shift+B in RStudio and then re-run the app with the command runUpthat(). For more on modifying shiny apps see the documentation associated with the package [@chang_shiny_2015]or the in progress open source book Mastering Shiny [@wickham_mastering_2020].

Deploying upthat

To deploy a tool like Upthat so that it is publicly available as an interactive web app, the following resources are required:


ATFutures/upthat documentation built on Dec. 31, 2019, 8:54 a.m.