In jasa-btun-anon/BTUN: BTUN: Bradley--Terry for Urban Networks

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

BTUN (Bradley--Terry for Urban Networks, /ˈbʌtn̩/) is a package which runs a Bradley--Terry model with a spatial component for comparative judgement data sets in urban areas. The purpose of this is to use a comparative judgement study to estimate how deprived different areas are. In the study, we show participants different pairs of areas and ask them to choose which of the pair is more deprived.

In this vignette, we'll use the BTUN package to estimate the levels of deprivation in different parts of Dar es Salaam, Tanzania.

library(BTUN)

Loading and Manipulating Data

The package includes shapefiles for the 452 subwards in Dar es Salaam. We load them by calling

data("dar.shapefiles")
library(sf)
plot(dar.shapefiles$geometry, lwd = 0.5)

The library sf allows us to plot the shapefiles.
To construct the prior distribution, we need to compute the adjacency matrix from the shapefiles. One way to do this is to use the package surveillance.

adj.matrix <- surveillance::poly2adjmat(dar.shapefiles$geometry) 
adj.matrix[365, 276] <- 1 #add bridge 
adj.matrix[276, 365] <- 1 #add bridge 
adj.matrix[363, 427] <- 1 #add ferry
adj.matrix[427, 363] <- 1 #add ferry

The $i,j^{th}$ element of the adjacency matrix is 1 if subwards $i$ and $j$ are neighbors and 0 otherwise. We manually include two extra pairs of neighbors to allow for connections across the Kurasini creek, which flows through the city.

The adjacency matrix allows us to view the city as a network where the subwards are nodes and edges are placed between neighboring subwards. The advantage to this is that it makes the density of subwards uniform across the city. The center of Dar es Salaam is densely packed with small subwards, whereas on the outskirts the subwards are much larger. This range of subward sizes and densities means using the Euclidean metric is not suitable, and the network version of the city resolves these issues.

The comparative judgement data set is also included in the package. It can be loaded using the command

data("dar.comparisons")

The data set consists of 76,408 comparisons, where judges were shown a pair of subwards and asked to choose which of the pair was more deprived. Some of the comparisons are shown below. In the first comparisons subward 230 was judged to be more dperived than subward 155.

head(dar.comparisons)

The BTUN package requires the data in a matrix, where the $(i, j)^{th}$ element contains the number of times area $i$ was judged to be more deprived than area $j$. We construct the matrix using the following code

win.matrix <- BTUN::comparisons_to_matrix(452, dar.comparisons)

Constructing the GP Covariance Matrix

The Gaussian Process prior distribution covariance matrix is an important part of the method. This allows us to construct a prior distribution over a space of functions on the network version of the city. We use the function registered_adjacency_covariance_function to construct this matrix, which is called by

k <- registered_adjacency_covariance_function(adj.matrix, type = "sqexp", hyperparameters = c(1, 0.5), linear.combination = rep(1, 452), linear.constraint = 0, tol = 1e-6)

In this example we use the squared exponential covariance function with variance 1 and length scale 0.5. As the BTUN model is a comparative judgement model, there is a identafiability issue. To resolve this we can fix a linear combination of the deprivation levels. This takes the form $\boldsymbol{A\lambda} = a$, where $\boldsymbol{A}$ is a vector containing the coefficient of the linear combination, $\boldsymbol{\lambda}$ is the vector of deprivation levels and $a$ is the value of the constraint. In our example above, we set $\boldsymbol{A} = \boldsymbol{1}$ and $a = 0$, which is equivalent to the sum of the levels being 0. The tolerance term is a small nugget term to allow the covariance matrix to be decomposed.

Fitting the Model

Wew fit the model using an MCMC algorithm. This is called by the following function

set.seed(123)
mcmc.output <- run_mcmc(n.iter = 100000, delta = 0.02, k.mean = k$mean, k.chol = k$decomp.covariance, win.matrix, f.initial = rep(0, 452), alpha = TRUE)

This takes around 30 minutes to run, so we do not call it in this vignette. However, we include the results for $f$ for this seed. A burn-in period of 30,000 iterations was used and the results were thinned by saving every $5^{th}$ iteration. The results can be loaded by

data("mean.deprivation")

We now produce a map of Dar es Salaam, colouring each subward by its posterior mean deprivation level. We create 10 equal sized bins in which to place the subwards and colour the bins appropriately.

#Create Colour Scale
library(RColorBrewer)
red.green.colours <- brewer.pal(10, "RdYlGn")
bin.size <- (2.5-(-1.5))/10
bins <- bin.size*(1:10) - 1.5

#Bin Subwards by colour
dar.colours <- numeric(452)
for(j in 1:452){
  dar.colours[j] <- min(which(bins >= mean.deprivation[j]))
}

To plot the map, we call:

par(fig=c(0,1,0.1,1))
plot(dar.shapefiles$geometry, col = red.green.colours[dar.colours], lwd = 0.25)
par(fig=c(0.1,0.9,0.2,0.25), mar = rep(0.2, 4), new = TRUE)
image(1:10, 1, as.matrix(1:10), col = brewer.pal(10, "RdYlGn"),
      xlab = "", ylab = "", xaxt = "n", yaxt = "n",
      bty = "n")
axis(1, at = seq(0.5, 10.5, 1), labels = round(c(-1.5, bins), 2.5), lty = 0)